Overview

I didn’t want to look at this code, but recently I wanted to solve a problem where the alert is automatically resolved after Prometheus reboots, and then waits for a certain period of time before triggering again. At first, I thought I’d change the code to solve the problem, but in the process I thought it was a common case, and I thought I’d consider whether the community would accept it, so I read the official Issue first.

Obviously, this problem has been solved, and I’m still suffering from not following the community version (hopefully this hole will be filled), so here’s a quick look at what the community did (which turned out to be pretty much the same as I thought).

Loading Alerting Rules

  • The code where the loading logic resides: rules/manager.go.
  • Loading Process
    • Load configuration files: m.loadGroups(interval, files...)
    • Update old configuration: update logic
      • Delete the old alert from the old groups.
      • Disable the old alerting rule.
      • newg.copyState(oldg)
      • Run newgroup after <-m.block: newg.run(m.opts.Context)

group copyState

  • g.evaluationTime = from.evaluationTime // How long did it take to execute this alerting rule last time?
  • g.seriesInPreviousEval[i] = from.seriesInPreviousEval[fi]
  • ar.active[fp] = a // Where are the pending and firing ?

group run What to do

defer close(g.terminated)
... ...
for {
... ...
                missed := (time.Since(lastTriggered).
                if missed > 0 {
                    iterationsMissed.Add(float64(missed))
                    iterationsScheduled.Add(float64(missed))
                }
                lastTriggered = time.Now()
                iter()
... ...
}

where iter is the enforcement of the alerting rule.

    iter := func() {
        iterationsScheduled.Inc()

        start := time.Now()
        g.Eval(ctx, start)

        iterationDuration.Observe(time.Since(start).Seconds()))
        g.SetEvaluationTime(time.Since(start))
    }

group stop What to do

These small contributions were made.

func (g *Group) stop() {
    close(g.bone)
    <-g.terminated
}

Overload Alerting Rules

Same as loading alerting rules

Enforcing Alerting Rules

  • Caution.
    • The query statement for alerting rules may have multiple records, so multiple alert states need to be recorded separately.
    • The active map is used to record the alert status of multiple records.
  • Some Important Attributes
    • Value: the value to be returned from the last execution of the PromQL statement.
    • ActiveAt.
    • FiredAt.
    • ResolvedAt: meaning as field name
    • LastSentAt.

Implementation Process

  • For each alerting rule, do this: vector, err := rule.Eval(ctx, ts, g.opts.QueryFunc, g.opts.ExternalURL)
  • Each alert is placed in: resultFPs := map[uint64]struct{}{}
  • Recorded alerts.

              // Check whether we already have alerting state for the identifying label set.
              // Update the last value and annotations if so, create a new alert entry otherwise.
              if alert, ok := r.active[h]; ok && alert.State != StateInactive {
                  alert.Value = smpl.V
                  alert.Annotations = annotations
                  continue
              }
    
              r.active[h] = &Alert{
                  Labels:      lb.Labels(),
                  Annotations: annotations,
                  ActiveAt:    ts,
                  State:       StatePending,
                  Value:       smpl.V,
              }
    
  • Check if any pending alerts should be removed or fire

  • Notify alerts: g.opts.NotifyFunc(ctx, ar.vector.String(), ar.currentAlerts()…)
  • Save all the returned results to TSDB: app.Add(s.Metric, s.T, s.V)
    • s.Metric: alert labels
    • T: Current time
    • V: the value that triggers the alert
  • Series no longer exposed, mark it stale. -> app.Add(s.Metric, ts, Nan)

Send alert

The code is in: cmd/prometheus/main.go.

func sendAlerts(n *notifier.Manager, externalURL string) rules.NotifyFunc {
    return func(ctx context.Context, expr string, alerts ...*rules.Alert) error {
        var res []*notifier.Alert

        for _, alert := range alerts {
            if alert.State == rules.StatePending {
                continue
            }
            a := &notifier.Alert{
                StartsAt:     alert.FiredAt,
                Labels:       alert.Labels,
                Annotations:  alert.Annotations,
                GeneratorURL: externalURL + strutil.TableLinkForExpression(expr),
            }
            if !alert.ResolvedAt.IsZero() {
                a.EndsAt = alert.ResolvedAt
            }
            res = append(res, a)
        }

        if len(alerts) > 0 {
            n.Send(res...)
        }
        return nil
    }
}

And look at the real sending logic:

// notifier/notifier.go
    for _, a := range alerts {
        lb := labels.NewBuilder(a.Labels)

        for ln, lv := range n.opts.ExternalLabels {
            if a.Labels.Get(string(ln)) == "" {
                lb.Set(string(ln), string(lv))
            }
        }

        a.Labels = lb.Labels()
    }

    alerts = n.relabelAlerts(alerts)

    if d := (len(n.queue) + len(alerts)) - n.opts.QueueCapacity; d > 0 {
        n.queue = n.queue[d:]    // Is there a bug here?

        level.Warn(n.logger).Log("msg", "Alert notification queue full, dropping alerts", "num_dropped", d)
        n.metrics.dropped.Add(float64(d))
    }
    n.queue = append(n.queue, alerts...)

    // Notify sending goroutine that there are alerts to be processed.
    n.setMore()  // ---->  Then there are the various notification methods

Alert Resolution

Clear resolved alerts.

// rules/alerting.go
func (r *AlertingRule) Eval(ctx context.Context, ts time.Time, query QueryFunc, externalURL *url.URL) (promql.Vector, error) {
... ...
    for fp, a := range r.active {
        if _, ok := resultFPs[fp]; !ok {
            // If the alert was previously firing, keep it around for a given
            // retention time so it is reported as resolved to the AlertManager.
            if a.State == StatePending || (!a.ResolvedAt.IsZero() && ts.Sub(a.ResolvedAt) > resolvedRetention) {
                delete(r.active, fp)
            }
            if a.State != StateInactive {
                a.State = StateInactive
                a.ResolvedAt = ts
            }
            continue
        }

Add EndedAt field

// cmd/prometheus/main.go
func sendAlerts(n *notifier.Manager, externalURL string) rules.NotifyFunc {
    return func(ctx context.Context, expr string, alerts ...*rules.Alert) error {
... ...
            if !alert.ResolvedAt.IsZero() {
                a.EndsAt = alert.ResolvedAt
            }

Continued alert after reboot

to save alert status

// rules/alerting.go
func (r *AlertingRule) Eval(ctx context.Context, ts time.Time, query QueryFunc, externalURL *url.URL) (promql.Vector, error) {
... ...
        if r.restored {
            vec = append(vec, r.sample(a, ts))
            vec = append(vec, r.forStateSample(a, ts, float64(a.ActiveAt.Unix())))
        }
... ...

And stored in TSDB

// rules/manager.go
            for _, s := range vector {
                if _, err := app.Add(s.Metric, s.T, s.V); err != nil {

Load alert state

// rules/manager.go
func (g *Group) run(ctx context.Context) {
... ...
    if g.shouldRestore {
        case <-g.done:
            return
        case <-tick.C:
            missed := (time.Since(evalTimestamp) / g.interval) - 1
            if missed > 0 {
                g.metrics.iterationsMissed.Add(float64(missed))
                g.metrics.iterationsScheduled.Add(float64(missed))
            }
            evalTimestamp = evalTimestamp.Add((missed + 1) * g.interval)
            iter()
        }

        g.RestoreForState(time.Now())
        g.shouldRestore = false
    }

An interesting point here is that the alerting rule (iter()) is executed once before the alert is reloaded, for the reasons stated in the comment.

The reason behind this is that we may not have collected enough data during the first execution (or before), and the relabel rule will not be updated to the latest values, which may be dependent on some alerts.

Let’s take a look at the loading process, which starts with defining the time to backtrack. This is the value we specify in the prometheus startup parameter --rules.alert.for-outage-tolerance.

    maxtMS := int64(model.TimeFromUnixNano(ts.UnixNano()))
    // We allow restoration only if alerts were active before after certain time.
    mint := ts.Add(-g.opts.OutageTolerance)
    mintMS := int64(model.TimeFromUnixNano(mint.UnixNano()))
    q, err := g.opts.Queryable.Querier(g.opts.Context, mintMS, maxtMS)

Then, for each alerting rule (in this case, organized by alerting rule), the historical status of the alerts corresponding to the alerting rule is verified one by one.:

for _, rule := range g.Rules() {
        alertHoldDuration := alertRule.HoldDuration()
        if alertHoldDuration < g.opts.ForGracePeriod {
            alertRule.SetRestored(true)
            continue
        }

This is represented by the --rules.alert.for-grace-period parameter of the prometheus startup parameter, which ignores the overload of the alerting rule if its for is less than this time and skips it.

        alertRule.ForEachActiveAlert(func(a *Alert) {

The interesting thing here is that the organization is actually organized in terms of the alerts that are active in memory right now, which is why the alerting rule is executed before it is reloaded, so that the alerts with for will be placed in active memory and the state will be pending, but don’t worry, they will be pending soon. firing now.

Here we build the label series by Alert and then query the corresponding persistent alerts.:

            smpl := alertRule.forStateSample(a, time.Now(), 0)
            ... ...
            sset := q.Select(false, nil, matchers...)
            ... ...
            var t int64
            var v float64
            it := s.Iterator()
            for it.Next() {
                t, v = it.At()
            }

The alert is then parsed out and the state (Pending/Firing) of the alert is calculated:

            downAt := time.Unix(t/1000, 0).UTC()
            restoredActiveAt := time.Unix(int64(v), 0).
            timeSpentPending := downAt.
            timeRemainingPending := alertHoldDuration - timeSpentPending

            if timeRemainingPending <= 0 {
                // The alert will be triggered directly here, and the alert will be triggered at the time before the reboot.
            } else if timeRemainingPending < g.opts.ForGracePeriod {
                // The logic here is odd, but the code gives an algorithm, and the conclusion is this
                // My understanding here is that the new restoredActiveAt is the time when the new Alert will become pending, and that if we add alertHoldDuration we will have g.opts.
                // ForGracePeriod is firing just after the start time of g.opts.
                restoredActiveAt = ts.Add(g.opts.ForGracePeriod).Add(-alertHoldDuration)
            } else {
                // This is actually an access where the time during the restart is not considered to be for
                // For example, if for is 5m, and 2m was for before reboot, and it took 2m to reboot, then after reboot, it will still be considered as for 2m, instead of 2+2 = 4m.
                downDuration := ts.Sub(downAt)
                restoredActiveAt = restoredActiveAt.Add(downDuration)
            }
            a.ActiveAt = restoredActiveAt

As you can see from the code, the Alert manager will not be notified immediately after a reload, but will have to wait another cycle, i.e., the second cycle after the reboot, before it will trigger again.