格物致知

Overview

When using HTTP, whether in a browser or an SDK’s HTTP client, we often have the need to reuse HTTP connections. This means making multiple HTTP requests and responses over the same connection, a concept commonly known as HTTP persistent connection (Keep-Alive).

In this article, we will explore persistent connections from both the protocol support perspective and its practical implementation in Go.

Protocol Support

The RFC for HTTP/1.1 dedicates an entire section to discussing persistent connections. Right at the beginning, it outlines the benefits of using persistent connections:

Reduces resource consumption across all involved devices (such as CPU and memory usage on clients, servers, and routers).
Enables multiple requests and responses to be pipelined over the same connection, improving connection efficiency.
Decreases the number of TCP connection establishments and closures, reducing network congestion by lowering the overall number of transmitted packets.
Lowers response latency by eliminating the overhead of establishing new connections.
Improves error handling since persistent connections allow for error reporting, whereas short connections can only indicate failures by terminating the connection.

According to the HTTP/1.1 RFC, persistent connections are the default behavior in HTTP/1.1. A connection is only considered short-lived if either the client or the server explicitly sends a Connection: Close header.

For persistent connections, if the connection is unexpectedly closed during transmission, the client should retry idempotent requests but must not retry non-idempotent requests. The HTTP/1.1 RFC specifies that GET, HEAD, PUT, and DELETE are idempotent request methods.

Go Implementation

Next, let’s explore how Go 1.22 handles the HTTP/1.1 protocol, with separate discussions for the Client and Server implementations.

Client

In Go, connection management is handled through the Transport layer, which implements the RoundTripper interface:

[root@liqiang.io]# cat client.go
type RoundTripper interface {
    RoundTrip(*Request) (*Response, error)
}

Since RoundTripper is an interface, we need to provide an implementation. However, by default, the HTTP client uses Go’s built-in default RoundTripper:

[root@liqiang.io]# cat transport.go
var DefaultTransport RoundTripper = &Transport{
    Proxy: ProxyFromEnvironment,
    DialContext: defaultTransportDialContext(&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }),
    ForceAttemptHTTP2:     true,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}
func (t *Transport) roundTrip(req *Request) (*Response, error) {
    ... ...
    for {
        select {
        case <-ctx.Done():
            req.closeBody()
            return nil, ctx.Err()
        default:
        }
        treq := &transportRequest{Request: req, trace: trace, cancelKey: cancelKey}
        cm, err := t.connectMethodForRequest(treq)
        ... ...
        pconn, err := t.getConn(treq, cm)
        ... ...
            resp, err = pconn.roundTrip(treq)
        ... ...
        } else if !pconn.shouldRetryRequest(req, err) {
            ... ...
            return nil, err
        }
        ... ...
        // Rewind the body if we're able to.
        req, err = rewindBody(req)
        if err != nil {
            return nil, err
        }
    }
}

From this simple implementation, we can see that Go’s HTTP Client uses persistent connections by default. It allows up to 100 concurrent persistent connections and has a default idle timeout of 90 seconds. Additionally, in certain cases, the client will automatically retry requests. Below are some scenarios where retries may occur:

When too many streams are opened on a single connection in an HTTP/2 scenario.
When the request does not contain a body or when the body can be re-acquired (in this case, Go does not explicitly distinguish between idempotent request methods).
When a non-EOF error occurs before reading the first byte of the response.
When the server closes an idle connection.

From the above code snippet, we can see that Go’s HTTP client implements these features. Now, let’s dive deeper into the underlying code logic to explore additional details:

[root@liqiang.io]# cat transport.go
func (t *Transport) getConn(treq *transportRequest, cm connectMethod) (pc *persistConn, err error) {
    ... ...
    w := &wantConn{
        cm:         cm,
        key:        cm.key(),
        ctx:        ctx,
        ready:      make(chan struct{}, 1),
        beforeDial: testHookPrePendingDial,
        afterDial:  testHookPostPendingDial,
    }
    if delivered := t.queueForIdleConn(w); delivered {
        pc := w.pc
        ... ...
        t.setReqCanceler(treq.cancelKey, func(error) {})
        return pc, nil
    }
    ... ...
}
func (t *Transport) queueForIdleConn(w *wantConn) (delivered bool) {
    if t.DisableKeepAlives {
        return false
    }
    t.idleMu.Lock()
    defer t.idleMu.Unlock()
    ... ...
    var oldTime time.Time
    if t.IdleConnTimeout > 0 {
        oldTime = time.Now().Add(-t.IdleConnTimeout)
    }
    // Look for most recently-used idle connection.
    if list, ok := t.idleConn[w.key]; ok {
        stop := false
        delivered := false
        for len(list) > 0 && !stop {
            pconn := list[len(list)-1]
            tooOld := !oldTime.IsZero() && pconn.idleAt.Round(0).Before(oldTime)
            if tooOld {
                go pconn.closeConnIfStillIdle()
            }
            if pconn.isBroken() || tooOld {
                list = list[:len(list)-1]
                continue
            }
            delivered = w.tryDeliver(pconn, nil)
            if delivered {
                if pconn.alt != nil {
                } else {
                    t.idleLRU.remove(pconn)
                    list = list[:len(list)-1]
                }
            }
            stop = true
        }
        if len(list) > 0 {
            t.idleConn[w.key] = list
        } else {
            delete(t.idleConn, w.key)
        }
        if stop {
            return delivered
        }
    }
    if t.idleConnWait == nil {
        t.idleConnWait = make(map[connectMethodKey]wantConnQueue)
    }
    q := t.idleConnWait[w.key]
    q.cleanFront()
    q.pushBack(w)
    t.idleConnWait[w.key] = q
    return false
}

From the code, we can see that Go’s implementation does not follow the RFC’s specification, which allows multiple ongoing requests over a single connection. Instead, in Go, each connection handles only one active request at a time. To manage multiple connections efficiently, Go maintains a connection pool, where connections are indexed using a key composed of the schema, method, and address.

Another interesting detail in the code is the use of an LRU (Least Recently Used) data structure. This structure is employed when the number of connections exceeds the limit—Go will close the longest idle connection to free up resources.

Returning Connections to the Pool

From the request-sending logic discussed earlier, we can see that connections are obtained via the function func (t *Transport) queueForIdleConn(w *wantConn) (delivered bool). Based on its implementation, we know that connections are stored in the idleConn connection pool. Since connections are borrowed, they must also be returned at some point. So, when exactly does this happen?

First, near the code responsible for obtaining connections, we can see the function func (t *Transport) tryPutIdleConn(pconn *persistConn) error. Based on its implementation, this function appears to be responsible for returning connections. To confirm this, we need to trace where this function is called. Excluding some exceptional handling cases, there are two main scenarios to focus on:

When an error occurs while processing a request. However, this does not refer to network errors (in which case the connection should be closed), but rather errors related to internal state handling.
When a request is completed and the response body has been fully read. The condition for being “fully read” is that the read operation returns an EOF.

I am particularly interested in the second scenario, so let’s walk through the code to analyze it further.

[root@liqiang.io]# cat transport.go
        select {
        case bodyEOF := <-waitForBodyRead:
            replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil) // before pc might return to idle pool
            alive = alive &&
                bodyEOF &&
                !pc.sawEOF &&
                pc.wroteRequest() &&
                replaced && tryPutIdleConn(trace)
            if bodyEOF {
                eofc <- struct{}{}
            }
        case <-rc.req.Cancel:
            alive = false
            pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
        case <-rc.req.Context().Done():
            alive = false
            pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
        case <-pc.closech:
            alive = false
        }

From this point, we can see that a connection can only be returned to the pool when the bodyEOF signal is received. But how is bodyEOF generated?

Looking further into the code, we can see that there is a wrapped Body object:

[root@liqiang.io]# cat transport.go
        body := &bodyEOFSignal{
            body: resp.Body,
            earlyCloseFn: func() error {
                waitForBodyRead <- false
                <-eofc // will be closed by deferred call at the end of the function
                return nil
            },
            fn: func(err error) error {
                isEOF := err == io.EOF
                waitForBodyRead <- isEOF
                if isEOF {
                    <-eofc // see comment above eofc declaration
                } else if err != nil {
                    if cerr := pc.canceled(); cerr != nil {
                        return cerr
                    }
                }
                return err
            },
        }
        resp.Body = body

Then, this wrapped object implements the Read method:

[root@liqiang.io]# cat transport.go
func (es *bodyEOFSignal) Read(p []byte) (n int, err error) {
    es.mu.Lock()
    closed, rerr := es.closed, es.rerr
    es.mu.Unlock()
    if closed {
        return 0, errReadOnClosedResBody
    }
    if rerr != nil {
        return 0, rerr
    }
    n, err = es.body.Read(p)
    if err != nil {
        es.mu.Lock()
        defer es.mu.Unlock()
        if es.rerr == nil {
            es.rerr = err
        }
        err = es.condfn(err)
    }
    return
}

So, when we call Read in our business logic, it actually invokes this wrapped object’s Read method. From the implementation, we can see that if an error occurs during reading, it is passed to the fn function for handling.

The logic inside fn determines the type of error and writes the result into a channel. This result is then picked up by the previously mentioned select statement, which checks whether the error is an EOF. If it is, the connection will be returned to the pool.

Server

Compared to the client, Go’s server-side implementation is relatively simple. In terms of the Goroutine model, it follows a Per Request, Per Goroutine approach:

[root@liqiang.io]# cat server.go
func (srv *Server) Serve(l net.Listener) error {
    for {
        rw, err := l.Accept()
        ... ...
        connCtx := ctx
        ... ...
        tempDelay = 0
        c := srv.newConn(rw)
        c.setState(c.rwc, StateNew, runHooks) // before Serve can return
        go c.serve(connCtx)
    }
}
func (c *conn) serve(ctx context.Context) {
    if ra := c.rwc.RemoteAddr(); ra != nil {
        c.remoteAddr = ra.String()
    }
    ctx = context.WithValue(ctx, LocalAddrContextKey, c.rwc.LocalAddr())
    var inFlightResponse *response
    ... ...
    ctx, cancelCtx := context.WithCancel(ctx)
    c.cancelCtx = cancelCtx
    defer cancelCtx()
    c.r = &connReader{conn: c}
    c.bufr = newBufioReader(c.r)
    c.bufw = newBufioWriterSize(checkConnErrorWriter{c}, 4<<10)
    for {
        w, err := c.readRequest(ctx)
        if c.r.remain != c.server.initialReadLimitSize() {
            // If we read any bytes off the wire, we're active.
            c.setState(c.rwc, StateActive, runHooks)
        }
        ... ...
        req := w.req
        ... ...
        c.curReq.Store(w)
        ... ...
        inFlightResponse = w
        serverHandler{c.server}.ServeHTTP(w, w.req)
        inFlightResponse = nil
        w.cancelCtx()
        if c.hijacked() {
            return
        }
        ... ...
        w.finishRequest()
        c.rwc.SetWriteDeadline(time.Time{})
        c.setState(c.rwc, StateIdle, runHooks)
        c.curReq.Store(nil)
        if !w.conn.server.doKeepAlives() {
            return
        }
        if d := c.server.idleTimeout(); d != 0 {
            c.rwc.SetReadDeadline(time.Now().Add(d))
        } else {
            c.rwc.SetReadDeadline(time.Time{})
        }
        if _, err := c.bufr.Peek(4); err != nil {
            return
        }
        c.rwc.SetReadDeadline(time.Time{})
    }

From this implementation, we can see that server-side connections are also reused, and similar to the client, each connection processes at most one active request at a time. However, not all connections are directly reused, as determined by the shouldReuseConnection function. Below are scenarios where a connection cannot be reused:

When the connection is too large or the client has explicitly sent a Connection: Close header.
When the response is incomplete (usually due to a write error).
When there is a write error in the connection.
When the request reading encounters an error.

Practice

Now that we’ve gone through most of the code, it’s time to verify our understanding with a practical example. I have uploaded all the relevant code to a dedicated repository:

🔗 GitHub Repository

If you’re interested, feel free to download and experiment with it yourself!

[root@liqiang.io]# cat main2.go
func main() {
    for i := 0; i < 101; i++ {
        func() {
            resp, err := http.Get("https://gobyexample.com")
            if err != nil {
                panic(err)
            }
            defer resp.Body.Close()
            fmt.Println("Response status:", resp.Status)
            fmt.Println("Connection Header:", resp.Header.Get("Connection"))
            _, _ = io.ReadAll(resp.Body)
        }()
        time.Sleep(time.Millisecond * 500)
    }
}

When we run this program and check the number of connections in another terminal, we will notice that there is only one active connection:

[root@liqiang.io]# ss -antp | grep "13.35.238"
ESTAB  0      0      192.168.121.195:44664  13.35.238.63:443   users:(("main",pid=87455,fd=6))

This reveals an interesting fact: even though we call http.Get multiple times, the underlying implementation actually reuses a single DefaultClient. As a result, persistent connections are reused, meaning that by default, Go’s HTTP client leverages long-lived connections.

However, if you forget to close the response body, things might not work as expected. For instance, take a look at the following code—can you spot the issue?

[root@liqiang.io]# cat main.go
func main() {
    for i := 0; i < 101; i++ {
        func() {
            resp, err := http.Get("https://gobyexample.com")
            if err != nil {
                panic(err)
            }
            defer resp.Body.Close()
            fmt.Println("Response status:", resp.Status)
            scanner := bufio.NewScanner(resp.Body)
            for i := 0; scanner.Scan() && i < 5; i++ {
                fmt.Println(scanner.Text())
            }
            if err := scanner.Err(); err != nil {
                panic(err)
            }
        }()
        time.Sleep(time.Millisecond * 500)
    }
}

When we run this program and check the connection status again, we will see that a new connection is created for each request:

[root@liqiang.io]# ss -antp | grep "13.35.238"
TIME-WAIT 0      0      192.168.121.195:59538  13.35.238.114:443
TIME-WAIT 0      0      192.168.121.195:52688   13.35.238.70:443
TIME-WAIT 0      0      192.168.121.195:44890   13.35.238.63:443
TIME-WAIT 0      0      192.168.121.195:54294   13.35.238.22:443
TIME-WAIT 0      0      192.168.121.195:44916   13.35.238.63:443
TIME-WAIT 0      0      192.168.121.195:54306   13.35.238.22:443
TIME-WAIT 0      0      192.168.121.195:44906   13.35.238.63:443

If you analyze the code again using the same logic as before, you will quickly discover the issue. As we previously mentioned, a connection is only returned to the pool when one of the following conditions is met:

An internal processing error occurs (excluding network failures).
The response body has been fully read and an EOF is encountered.

The issue here is that the response body is not fully read before calling Close. Since the connection is not properly released back to the pool, it cannot be reused.

This highlights an important takeaway when writing Go HTTP client code: even if you don’t need the entire response body, it’s best to read it completely before closing. This ensures that the connection can be reused, improving efficiency and reducing unnecessary connection overhead.

Middleware

In real-world applications, we usually do not expose Go’s HTTP client directly to external services. Instead, middleware components such as load balancers (e.g., Nginx) are often placed in between. When using persistent connections, it is important to consider the middleware’s handling of keep-alive connections. Below, we will use Nginx as an example to discuss this aspect.

Nginx

Nginx’s keep-alive parameters can be categorized into two sections:

Nginx <-> Upstream (Backend Servers)
Client <-> Nginx

According to Nginx’s documentation, the key keepalive parameters are as follows:

Connection	Parameter	Syntax	Default Value	Description
Client	`keepalive_disable`	`keepalive_disable none \| msie \| safari ...;`	`keepalive_disable msie6;`	Disables keep-alive for specific browsers.
Client	`keepalive_requests`	`keepalive_requests number;`	`keepalive_requests 1000;`	Maximum number of requests a single connection can handle.
Client	`keepalive_time`	`keepalive_time time;`	`keepalive_time 1h;`	Maximum lifetime of a persistent connection before it is closed.
Client	`keepalive_timeout`	`keepalive_timeout timeout [header_timeout];`	`keepalive_timeout 75s;`	Maximum idle time for a connection. Some browsers support `Keep-Alive: timeout=time`.
Server (Upstream)	`keepalive`	`keepalive connections;`	`-`	Maximum number of connections each worker process can maintain with upstream servers.
Server (Upstream)	`keepalive_requests`	`keepalive_requests number;`	`keepalive_requests 1000;`	Maximum number of requests a connection can handle.
Server (Upstream)	`keepalive_time`	`keepalive_time time;`	`keepalive_time 1h;`	Maximum lifespan of each connection before it is closed.
Server (Upstream)	`keepalive_timeout`	`keepalive_timeout timeout;`	`keepalive_timeout 60s;`	Idle timeout for connections.

References

Important HTTP RFCs:
- HTTP 1.0: RFC 1945
- HTTP 1.1 Proposed Standard: RFC 2068
- HTTP 1.1 Draft Standard: RFC 2616
- HTTP 2.0: RFC 7540
- HTTP 3.0: RFC 9114
Nginx
- Client：http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_disable
- Upstream：http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

HTTP Keep Alive Theory And Practice