Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd client v3 v3.5.0 sends invalid :authority header field value #13192

Closed
tatsuhiro-t opened this issue Jul 7, 2021 · 22 comments · Fixed by #13359
Closed

etcd client v3 v3.5.0 sends invalid :authority header field value #13192

tatsuhiro-t opened this issue Jul 7, 2021 · 22 comments · Fixed by #13359
Assignees
Labels
area/clientv3 priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@tatsuhiro-t
Copy link

Starting from etcd v3.5.0, etcd client v3 sends invalid :authority header field value which violates RFC.

Suppose we created a client like so:

        c, err := clientv3.New(clientv3.Config{
                Endpoints: []string{
                        "https://example.com:2379",
                },
                ...
        }  

And started making a connection and tried to get some stuff from etcd server.

In the previous etcd client v3, it sends :authority header field value like this:

example.com:2379

But now with the v3.5.0, it sends like this:

#initially=[https://example.com:2379]

According to the RFC7540, :authority header field contains the authority portion of the target URI.
RFC3986 dictates that authority is:

authority   = [ userinfo "@" ] host [ ":" port ]
host        = IP-literal / IPv4address / reg-name
IP-literal = "[" ( IPv6address / IPvFuture  ) "]"
IPvFuture  = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
reg-name    = *( unreserved / pct-encoded / sub-delims )
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

#initially=[https://example.com:2379] is not a valid authority.

It seems to me that there might be a bug in the code that authority is not parsed or extracted correctly.
The annoying fact of this issue is that it works if server does not validate the invalid field value. But if you have a proxy between client and server which checks some non-compliant octets, then proxy refuses a request.

@tatsuhiro-t
Copy link
Author

Any news? This issue prevents us from using a proxy between etcd client and server.

@gbolo
Copy link

gbolo commented Aug 24, 2021

Please fix this issue, as linkerd-proxy (using the a rust h2 lib) prevents this traffic from passing due to this invalid authority header value!

@akunszt
Copy link

akunszt commented Aug 27, 2021

We also hit by this issue. This prevents us to use AWS Application Load Balancers in front of our etcd clusters and that blocks upgrading. Also this breaks Kubernetes 1.22 as the apiserver couldn't connect to the etcd cluster because of this.

@harshavardhana
Copy link

Looks like an expected implementation

https://github.com/etcd-io/etcd/blob/55b7b745891e816b257681a297a151ca14536fa1/CHANGELOG-3.5.md#package-clientv3

Endpoints self identify now as etcd-endpoints://{id}/#initially={list of endpoints} e.g. etcd-endpoints://0xc0009d8540/#initially=[localhost:2079]

target := fmt.Sprintf("%s://%p/#initially=[%s]", resolver.Schema, c, initialEndpoints)

There has been some kind of protocol-level change here which is perhaps breaking the etcd clients going through proxy?

@akunszt
Copy link

akunszt commented Sep 3, 2021

As far as I understand the etcd uses the :authority HTTP header in a way which is not valid according to the RFC. It looks like the Go implementation doesn't validate it but most of the network proxies do and because it's an invalid HTTP/2 request then they will drop it.
I think the etcd should use an :authority header conform to the RFC and maybe add another header to carry that information. Assuming that if it's really necessary to be in the headers and it's not an unintentional side-effect.

@gbolo
Copy link

gbolo commented Sep 3, 2021

Looks like an expected implementation

https://github.com/etcd-io/etcd/blob/55b7b745891e816b257681a297a151ca14536fa1/CHANGELOG-3.5.md#package-clientv3

Endpoints self identify now as etcd-endpoints://{id}/#initially={list of endpoints} e.g. etcd-endpoints://0xc0009d8540/#initially=[localhost:2079]

target := fmt.Sprintf("%s://%p/#initially=[%s]", resolver.Schema, c, initialEndpoints)

There has been some kind of protocol-level change here which is perhaps breaking the etcd clients going through proxy?

the expected implmentation did not occur here:

expected:
etcd-endpoints://0xc0009d8540/#initially=[localhost:2079]

reality:
#initially=[localhost:2079]

someone needs to troubleshoot why that is. Perhaps an upstream library (maybe grpc) is having trouble parsing the expected value and is producing this partial value instead. This release should be considered broken until it's fixed. And since this header is so important for proxying to actually work, perhaps a test case needs to be added to ensure it has the expected value.

@tatsuhiro-t
Copy link
Author

Obviously sending etcd-endpoints://0xc0009d8540/#initially=[localhost:2079] in :authority is not correct because the former is not any kind of a remote host.
:authority must be a host name or an IP address of remote host. In this case that is "localhsot:2079".

@ptabor ptabor pinned this issue Sep 6, 2021
@ptabor ptabor modified the milestone: etcd-v3.5 Sep 6, 2021
@ptabor ptabor added area/clientv3 priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 6, 2021
blairdrummond pushed a commit to blairdrummond/minio-argocd-manifests that referenced this issue Sep 6, 2021
@ptabor
Copy link
Contributor

ptabor commented Sep 6, 2021

@menghanl Could you, please, recommend a solution related to grpc client side load-balancing ?

  1. Etcd client can be configured using multiple endpoints and is doing round-robin between these endpoints using grpc resolver API (custom resolver). The resolver uses internal etcd-endpoints://0xc0009d8540/#initially=[localhost:2079] naming to define the 'target'.

  2. grpc seems to be using target from pre-resolver execution 'as authority' (if creds or explicit override is not specified):

    https://github.com/grpc/grpc-go/blob/b2ba77a36ff809ab344b98368d9ecc3e12f943d6/clientconn.go#L269

    I would expect it to be using ServerName produced by the resolver:

    addresses[i] = resolver.Address{Addr: addr, ServerName: serverName}

Double checking it's the expected behaviour.


Assuming the grpc behaviour is WAI, I see following options:

A) For single-endpoint clients do not register the resolver
As etcd client supports SetEndpoints method that allows to reconfigure in runtime the set of endpoints. For this reason we create the resolver even if user specified only 1 endpoint, to allow later setting multiple endpoints.
(We can consider a breaking? change where user needs to opt-in for resolver to be able to use SetEndpoint, otherwise the endpoint is used directly).

B) Configure grpc explicitly using WithAuthority, providing the arbitrary chosen endpoint if there are multiple given by the user.

@menghanl
Copy link

menghanl commented Sep 7, 2021

The authority is not set to the server name. This is grpc/grpc-go#4516.
It will be fixed, as we are making changes around authority (tracked in grpc/grpc-go#4717).


Does the client use TLS or other creds?
WithAuthority would be a good solution for now, but it only works with InSecure.
For secure clients, overriding in the transport creds would also work.

@serathius
Copy link
Member

Hey, I can take a look into implementing the fix. I have only one question about solution proposed by @ptabor in #13192 (comment)

B) Configure grpc explicitly using WithAuthority, providing the arbitrary chosen endpoint if there are multiple given by the user.

Is it ok to just pick first endpoint? or maybe allow users to provide their own authority

@ptabor
Copy link
Contributor

ptabor commented Sep 16, 2021

I would pick the first endpoint and wait for use-cases that would require having an user-driven authority.

@serathius
Copy link
Member

sg, on question of tests. What is the best way to test authority value. I see two option:

  • Integration test that adds grpc interceptor that records requests and stores authority header value
  • E2e tests that runs server with GODEBUG=http2debug=2, this way server will log http2: decoded hpack field header field ":authority" = "127.0.0.1:2379". We can check server logs to find authority value.

Any preferences or other suggestions?

@tatsuhiro-t
Copy link
Author

One concern when you use wrong authority in :authority header field is that a proxy might fail to forward request if it expects the correct host in :authority. Imagine that proxy covers various services and routes the request based on host or :authority header field.

@serathius
Copy link
Member

Sent a draft PR, PTAL

@lixianwa
Copy link

I had a same issus, when use istio service mesh. the etcd client can not connect etcd server because of the request [:authority] header of client is invalid, from rfc3986, the http2 authority header should be this

authority = [ userinfo "@" ] host [ ":" port ]

but now etcd clientv3 authority header like it

#initially=[etcd-headless.infra:2379]

so istio envoy proxy can not forword request and print error:

2021-09-18T02:11:53.363178Z debug envoy http2 [C2750] invalid http2: Invalid HTTP header field was received: frame type: 1, stream: 1, name: [:authority], value: [#initially=[etcd:2379]]
2021-09-18T02:11:53.363183Z debug envoy http2 [C2750] invalid frame: Invalid HTTP header field was received on stream 1

some one has a solution about this?

@ptabor
Copy link
Contributor

ptabor commented Sep 20, 2021

Marek keeps looking into this problems and facing different trade-offs.
Do your use-cases require client side load-balancing, or the load-balancing is done in the Envoy / istio layer,
i.e. do you specify multiple endpoints in etcd-client configuration, or just a single endpoint pointing on the proxy ?

@tatsuhiro-t
Copy link
Author

In our usecase, we specify multiple endpoints to etcd client configuration. At the same time, we have a client-facing proxy at the remote endpoint for some reason. So the load balancing is done at the client, and we use proxy at the remote.

@serathius
Copy link
Member

In our usecase, we specify multiple endpoints to etcd client configuration. At the same time, we have a client-facing proxy at the remote endpoint for some reason. So the load balancing is done at the client, and we use proxy at the remote.

This means that we need to provide proper authority to each of the endpoints. Solution proposed by @ptabor would not be enough to fix this. Problem with authority was caused when switching from internal loadbalancing implementation to official grpc one. Based on grpc/grpc-go#4717 it doesn't look that current grpc implementation is not able to support what we need. This prompts me to propose revert to internal implementation.

When trying to implement @ptabor I stumbled upon lot's of problems with tests. Even small changes in logic could result in code going into different URL parser branch and break some tests. I also found discrepancies between integration and e2e tests on edge cases. For example "unix://localhost:m00" address cannot be used in e2e (not correct URL due to bad port), but is used by integration tests that don't use same validation and depend on multiple edge cases that should not normally happen.

Overall conclusion is that we need a set of robust e2e tests that will verify each officially supported name syntax. List of name patterns should be taken from https://github.com/grpc/grpc/blob/master/doc/naming.md. This should ensure that any fix we implement will not break any other case and confirm that revert and future reimplementation will work properly with authority header.

@serathius
Copy link
Member

I have confirmed that both etcd and grpc never supported passing correct authority header in multi endpoint scenario. Even before v3.5 (#12671) changes to grpc loadbalancing we always send same authority based on first endpoint to all other endpoints. This was always limitation of golang grpc implementation, reason is that authority is always based on ClientConn struct that handles loadbalancing between multiple endpoints.

Code on grpc side:

I have confirmed behavior on etcd v3.4.16 with simple test:

Run etcd cluster with http2debug enabled (recommended to run each etcd in separate console)

GODEBUG=http2debug=2 bin/etcd-v3.4.16/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr &

GODEBUG=http2debug=2 bin/etcd-v3.4.16/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr &

GODEBUG=http2debug=2 bin/etcd-v3.4.16/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr &

Run etcdctl with multiple endpoints

./bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 get a

Look for log

2021-09-28 15:31:46.783764 I | http2: decoded hpack field header field ":authority" = "127.0.0.1:2379"

No matter which member gets requests, authority is always equal to address of first member.

Based on this results I conclude that Etcd was never able to provide correct authority header in multiple endpoint scenario. I will implement fix as originally proposed by @ptabor. Please let me know if I missed something.

@tatsuhiro-t
Copy link
Author

I think the proposed fix is good enough to resolve this issue.

@Mane0007
Copy link

Open

1 similar comment
@Mane0007
Copy link

Open

@ptabor ptabor unpinned this issue Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clientv3 priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Development

Successfully merging a pull request may close this issue.

9 participants