Bug report criteria
What happened?
EtcdManualResolver.Build() in client/v3/internal/resolver/resolver.go sends two resolver updates to gRPC in rapid
succession with different ServiceConfig values. This causes gRPC to switch balancers mid-connection, killing an in-
flight SubConn and producing warnings:
[core] [Channel #2 SubChannel #5] grpc: addrConn.createTransport failed to connect to
{Addr: "127.0.0.1:2379", ...}. Err: connection error: desc = "transport: Error while dialing:
dial tcp 127.0.0.1:2379: operation was canceled"
All etcd operations succeed, but each occurrence wastes resources — a throwaway TCP dial, TLS handshake, and SubConn
teardown per connection. In applications that create etcd clients frequently, this adds up to unnecessary CPU and
network overhead, unbounded channelz ID growth, and persistent warning log noise that obscures real issues.
What did you expect to happen?
A single resolver update with the complete state (endpoints + round_robin ServiceConfig), producing no spurious
warnings.
How can we reproduce it (as minimally and precisely as possible)?
Any kube-apiserver connecting to etcd will produce these warnings. The frequency depends on how often new etcd clients
are created. Every new grpc.ClientConn created through the etcd client triggers the double resolver update in Build(). This includes:
- newClient() — initial client creation
- client.Dial() — used by maintenance.Status(), maintenance.HashKV(), and other per-endpoint operations
Any application that creates etcd clients frequently or creates multiple clients concurrently will see these warnings.
The race is timing-dependent — it's deterministic under load (many concurrent client initializations) but may not
reproduce in a small standalone binary where TCP dials complete in <1ms.
Anything else we need to know?
Root cause
Build() currently does:
go
func (r *EtcdManualResolver) Build(target resolver.Target, cc resolver.ClientConn,
opts resolver.BuildOptions) (resolver.Resolver, error) {
r.serviceConfig = cc.ParseServiceConfig(`{"loadBalancingPolicy": "round_robin"}`)
if r.serviceConfig.Err != nil {
return nil, r.serviceConfig.Err
}
res, err := r.Resolver.Build(target, cc, opts) // ← sends update #1
if err != nil {
return nil, err
}
r.updateState() // ← sends update #2
return res, nil
}
Proposed fix
Move r.updateState() before r.Resolver.Build():
Etcd version (please run commands below)
Details
N/A — the bug is in the etcd client library (go.etcd.io/etcd/client/v3), not the server.
Confirmed affected:
- etcd client: v3.6.5
- etcd server: v3.6.7 / v3.6.10
- grpc-go: v1.72.2
The buggy code in client/v3/internal/resolver/resolver.go has not changed
across v3.6.x releases — all versions using EtcdManualResolver are affected.
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
Bug report criteria
What happened?
EtcdManualResolver.Build() in client/v3/internal/resolver/resolver.go sends two resolver updates to gRPC in rapid
succession with different ServiceConfig values. This causes gRPC to switch balancers mid-connection, killing an in-
flight SubConn and producing warnings:
All etcd operations succeed, but each occurrence wastes resources — a throwaway TCP dial, TLS handshake, and SubConn
teardown per connection. In applications that create etcd clients frequently, this adds up to unnecessary CPU and
network overhead, unbounded channelz ID growth, and persistent warning log noise that obscures real issues.
What did you expect to happen?
A single resolver update with the complete state (endpoints + round_robin ServiceConfig), producing no spurious
warnings.
How can we reproduce it (as minimally and precisely as possible)?
Any kube-apiserver connecting to etcd will produce these warnings. The frequency depends on how often new etcd clients
are created. Every new grpc.ClientConn created through the etcd client triggers the double resolver update in Build(). This includes:
Any application that creates etcd clients frequently or creates multiple clients concurrently will see these warnings.
The race is timing-dependent — it's deterministic under load (many concurrent client initializations) but may not
reproduce in a small standalone binary where TCP dials complete in <1ms.
Anything else we need to know?
Root cause
Build() currently does:
Proposed fix
Move r.updateState() before r.Resolver.Build():
Etcd version (please run commands below)
Details
N/A — the bug is in the etcd client library (
go.etcd.io/etcd/client/v3), not the server.Confirmed affected:
The buggy code in
client/v3/internal/resolver/resolver.gohas not changedacross v3.6.x releases — all versions using EtcdManualResolver are affected.
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output