格物致知

Common Use Cases

In Go programming, we often encounter scenarios where timeout handling is necessary. For instance, in network programming, we might reuse a connection to send multiple requests. Each request may have its own timeout, indicating how long to wait for a response before returning a timeout error. In such cases, we naturally think of the time.After function from the Go standard library and write code like the following:

[root@liqiang.io]# cat main.go
for req := range channel {
    conn.Write(req)
    go func() {
        select {
        case res := <-req.ch:
            // successfully received the response
            req.Ok()
        case <-time.After(req.Timeout):
            // timeout
            req.Err = timeout
            req.Cancel()
        }
    }()
}

Common Pitfalls

At first glance, the above code seems logically sound. However, when running it in practice, we sometimes notice that memory usage increases steadily. Upon using pprof, we find that time.After is the culprit. Upon reviewing the code, we realize the problem: even after the request (req) receives a normal response, the memory allocated for time.After is not immediately reclaimed. Instead, it waits until the timeout duration elapses before being released. If the timeout is relatively long (e.g., on the order of minutes) and concurrency is high, memory usage can grow significantly, leading to this issue.

Solution

Once the problem is identified, the solution becomes apparent. From the implementation of time.After, we can see that it essentially creates a time.Timer. The time.Timer provides the following methods:

[root@liqiang.io]# cat sleep.go
func (t *Timer) Stop() bool
func (t *Timer) Reset(d Duration) bool

Therefore, we can create a time.Timer manually and stop the timer ourselves in successful scenarios, ensuring proper goroutine cleanup and more efficient resource utilization. Here’s the updated code:

[root@liqiang.io]# cat main.go
for req := range channel {
    conn.Write(req)
    go func() {
        timeoutTimer := time.NewTimer(req.Timeout)
        select {
        case res := <-req.ch:
            // successfully received the response
            timeoutTimer.Stop()
            req.Ok()
        case <-timeoutTimer.C:
            // timeout
            req.Err = timeout
            req.Cancel()
        }
    }()
}

Additional Insights

Timer Implementation

When inspecting the time.Timer source code, we find that the implementation of startTimer is not in sleep.go but rather directly declared as a function. Following Go’s source code conventions, this implementation is typically related to the underlying system. Exploring Go’s source code repository reveals that the timer implementation has undergone iterations, particularly with significant changes between Go 1.13 and Go 1.14:

Go 1.13
- Uses 64 fixed timer buckets for load balancing.
- The issue here is that a high number of timers increases the frequency of binding and unbinding P and M, reducing efficiency.
Go 1.14
- Implements a 4-ary tree for timers on P.
- Timers are triggered either through the scheduling loop or system monitoring, with added support for netpoll blocking wake-ups to ensure more timely execution. (This specific scenario is not entirely clear to me.)

As of Go 1.22, the implementation remains consistent with Go 1.14, where timer objects are added to the timer tree of P, and executable timers are retrieved during scheduling.

Timer Precision

After understanding Go’s timer implementation, another question arises: what level of precision does Go guarantee for its timers? Here are two factors that could affect precision:

Goroutine Scheduling: Timers are scheduled on P during goroutine scheduling, which could be affected by the execution time of running goroutines.
Synchronous Callback Execution: If multiple timers need to be executed within the same cycle, later timers could be delayed by the execution time of earlier timers.

From Reference 2, I also learned about another potential precision factor relevant in high-demand scenarios:

Time Measurement Granularity: The timer’s execution precision depends on the granularity of time measurement—whether it is in milliseconds (ms), nanoseconds (ns), or microseconds (µs).

According to Go’s implementation:

Scheduling Timing: The two scheduling timings—goroutine scheduling and system daemon scheduling—result in a granularity of approximately +10ms.
Delay Guarantees: The delay time is not guaranteed, which may pose a significant hidden risk.
Execution Precision: Timer execution precision is determined by comparing with system timestamps, which are accurate to the nanosecond (ns) level.

Thus, if timer precision is critical, it may be more reliable to run timing tasks in a separate process rather than mixing them with business logic.

Verification Code

I wrote a test case to verify this behavior: GitHub Link. The following code was used:

[root@liqiang.io]# cat main.go
for {
    time.Sleep(time.Millisecond * 10)
    go func() {
        t := time.NewTimer(time.Minute * 3)
        select {
        case res := <-ch1:
            t.Stop()
            fmt.Println(res)
        case <-t.C:
            fmt.Println("timeout")
        }
    }()
}

Using the top command to monitor memory, we observe that memory usage remains stable regardless of how long the code runs. However, replacing the code with the following snippet results in steadily increasing memory usage (though capped at a certain level):

[root@liqiang.io]# cat main.go
for {
    time.Sleep(time.Millisecond * 10)
    go func() {
        select {
        case res := <-ch1:
            fmt.Println(res)
        case <-time.After(time.Minute * 3):
            fmt.Println("timeout")
        }
    }()
}

time.After usage and possible issue in Go