Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server/auth: users/authentication data not loaded on member startup #14355

Closed
Tracked by #14138 ...
vivekpatani opened this issue Aug 16, 2022 · 14 comments
Closed
Tracked by #14138 ...

server/auth: users/authentication data not loaded on member startup #14355

vivekpatani opened this issue Aug 16, 2022 · 14 comments

Comments

@vivekpatani
Copy link
Contributor

vivekpatani commented Aug 16, 2022

What happened?

  1. Start a fresh 3.5 cluster
  2. Enable auth
  3. Create a key and watch
  4. Restart the 3.5 cluster

Run into issue.

What did you expect to happen?

User/Auth to work seamlessly/without issues.

How can we reproduce it (as minimally and precisely as possible)?

Setup

git clone https://github.com/etcd-io/etcd.git
cd etcd
git checkout release-3.5

Docker etcd server

docker run --platform linux/amd64 -P -it --rm -v /path/to/etcd:/go/src/go.etcd.io/etcd docker.io/golang:1.16.15
cd src/go.etcd.io/etcd
./build
./bin/etcd --logger zap --log-level debug

Docker etcd client

docker ps -a
docker exec -it <container_id> bash ## container ID of the server
cd src/go.etcd.io/etcd
Follow instructions given here: https://etcd.io/docs/v3.5/op-guide/authentication/authentication/
password: badboi (just for reference)

Shutdown the Docker etcd server by pressing Ctrl + C and typing exit and return

Docker etcd server - restart

docker run --platform linux/amd64 -P -it --rm -v /path/to/etcd:/go/src/go.etcd.io/etcd docker.io/golang:1.16.15
./build.sh
./bin/etcd --log-level debug

Docker etcd client - restart

docker ps -a
docker exec -it <container_id> bash ## container ID of the server
cd src/go.etcd.io/etcd
./bin/etcdctl watch foo --user=user0:badboi

Client side log
watch was canceled (rpc error: code = PermissionDenied desc = etcdserver: permission denied)
Error: watch is canceled by the server

Server side log
.
.
{"level":"error","ts":"2022-08-16T04:08:30.961Z","caller":"auth/range_perm_cache.go:114","msg":"user doesn't exist","user-name":"user0","stacktrace":"go.etcd.io/etcd/server/v3/auth.(*authStore).isRangeOpPermitted\n\t/go/src/go.etcd.io/etcd/server/auth/range_perm_cache.go:114\ngo.etcd.io/etcd/server/v3/auth.(*authStore).isOpPermitted\n\t/go/src/go.etcd.io/etcd/server/auth/store.go:870\ngo.etcd.io/etcd/server/v3/auth.(*authStore).IsRangePermitted\n\t/go/src/go.etcd.io/etcd/server/auth/store.go:882\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).isWatchPermitted\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:235\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:269\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:189"}
{"level":"debug","ts":"2022-08-16T04:08:30.967Z","caller":"v3rpc/watch.go:191","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = context canceled"}
.
.

Anything else we need to know?

Working on this, trying to fix this.

Etcd version (please run commands below)

$ etcd --version
3.5.4

$ etcdctl version
3.5.4

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

> Client side log
watch was canceled (rpc error: code = PermissionDenied desc = etcdserver: permission denied)
Error: watch is canceled by the server

> Server side log
.
.
{"level":"error","ts":"2022-08-16T04:08:30.961Z","caller":"auth/range_perm_cache.go:114","msg":"user doesn't exist","user-name":"user0","stacktrace":"go.etcd.io/etcd/server/v3/auth.(*authStore).isRangeOpPermitted\n\t/go/src/go.etcd.io/etcd/server/auth/range_perm_cache.go:114\ngo.etcd.io/etcd/server/v3/auth.(*authStore).isOpPermitted\n\t/go/src/go.etcd.io/etcd/server/auth/store.go:870\ngo.etcd.io/etcd/server/v3/auth.(*authStore).IsRangePermitted\n\t/go/src/go.etcd.io/etcd/server/auth/store.go:882\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).isWatchPermitted\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:235\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:269\ngo.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2\n\t/go/src/go.etcd.io/etcd/server/etcdserver/api/v3rpc/watch.go:189"}
{"level":"debug","ts":"2022-08-16T04:08:30.967Z","caller":"v3rpc/watch.go:191","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = context canceled"}
.
.
@vivekpatani
Copy link
Contributor Author

vivekpatani commented Aug 17, 2022

IMO this is a critical issue, much more so than I initially thought it to be. @ahrtr @mitake @serathius

Actually I stand corrected, this issue will break ANY combination of watches + auth. An easier way to reproduce this is:

  1. Checkout this 9d7e108
  2. Then startup etcd
  3. Use a client and follow: https://etcd.io/docs/v3.5/op-guide/authentication/authentication/
  4. ./bin/etcdctl watch foo --user=user0:123
  5. Restart the cluster
  6. ./bin/etcdctl watch foo --user=user0:123

@vivekpatani
Copy link
Contributor Author

To also make sure this didn't exist before that commit, just pick a commit before this

  1. Checkout this 3237289
  2. Use a client and follow: etcd.io/docs/v3.5/op-guide/authentication/authentication
  3. ./bin/etcdctl watch foo --user=user0:123
  4. Restart the cluster
  5. ./bin/etcdctl watch foo --user=user0:123

@vivekpatani
Copy link
Contributor Author

I'm trying to work on a fix, just wanted to bring this to your/communities attention. Thanks.

@vivekpatani vivekpatani changed the title server/auth: upgrade from 3.4 -> 3.5 causes users/authentication loss server/auth: users/authentication loss after member restart Aug 17, 2022
@ahrtr
Copy link
Member

ahrtr commented Aug 17, 2022

Thanks @vivekpatani reporting this.

Confirmed that I can reproduce this even on main, and it has nothing to do with watch. The reproduce steps are pretty simple,

  1. Start etcd server;
  2. enable auth per guide https://etcd.io/docs/v3.5/op-guide/authentication/authentication/ ;
  3. Restart etcd server;
  4. Execute command etcdctl --user=user0:user0 get foo, then you will get error something like below,
$ etcdctl --user=user0:user0 get foo
{"level":"warn","ts":"2022-08-18T05:06:58.574+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:64","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004921e0/127.0.0.1:2379","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = PermissionDenied desc = etcdserver: permission denied"}
Error: etcdserver: permission denied

When you add another user or role, then the issue disappear. The reason should be that etcd doesn't load the permission correctly on startup. cc @mitake

@ahrtr
Copy link
Member

ahrtr commented Aug 17, 2022

I just checked the source code, and it seems that etcd doesn't load the permission at all on startup. Instead, it just load on receiving new auth requests, i.e. authEnable/Disable, adding/Deleting users/roles etc..

It should be a minor fix although it's a major issue.

@vivekpatani
Copy link
Contributor Author

Thanks for confirming and the insight.

@ahrtr can you point me to the code you're looking at?

Do you want me to work on this or would you like to take this up? Just want to make sure we're not duplicating work.

@ahrtr
Copy link
Member

ahrtr commented Aug 17, 2022

@vivekpatani you just need to call refreshRangePermCache on etcd startup.

Please feel free to deliver a PR for this.

@vivekpatani
Copy link
Contributor Author

Thanks @ahrtr will do!

@mitake
Copy link
Contributor

mitake commented Aug 17, 2022

So sorry for missing such a large issue… thanks for reporting @vivekpatani
I’ll consider how to prevent such an issue in future, like enhancing test cases for restarting…

vivekpatani pushed a commit to vivekpatani/etcd that referenced this issue Aug 18, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
@vivekpatani
Copy link
Contributor Author

@mitake do not worry, it was caught before we cut a release, testing this was also difficult because we needed to restart to catch this.

@ahrtr @mitake #14358, please feedback if this is adequate.

@vivekpatani
Copy link
Contributor Author

Once approved, will also backport to 3.4/3.5. Thanks.

@mitake
Copy link
Contributor

mitake commented Aug 18, 2022

@vivekpatani yeah, but I should prepare some mechanisms like e2e testing for preventing similar issue in future…

vivekpatani added a commit to vivekpatani/etcd that referenced this issue Aug 22, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355

Signed-off-by: vivekpatani <[email protected]>
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Aug 23, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355

Signed-off-by: vivekpatani <[email protected]>
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Aug 23, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355

Signed-off-by: vivekpatani <[email protected]>
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Aug 24, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355

Signed-off-by: vivekpatani <[email protected]>
vivekpatani pushed a commit to vivekpatani/etcd that referenced this issue Aug 31, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358
vivekpatani pushed a commit to vivekpatani/etcd that referenced this issue Aug 31, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Aug 31, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
vivekpatani pushed a commit to vivekpatani/etcd that referenced this issue Sep 7, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
vivekpatani pushed a commit to vivekpatani/etcd that referenced this issue Sep 7, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
@ahrtr ahrtr mentioned this issue Sep 8, 2022
8 tasks
@vivekpatani
Copy link
Contributor Author

@ahrtr @serathius should we close this now?

@ahrtr
Copy link
Member

ahrtr commented Sep 9, 2022

Thanks @vivekpatani

Resolved in 3.4/3.5/main.

@ahrtr ahrtr closed this as completed Sep 9, 2022
@ahrtr ahrtr changed the title server/auth: users/authentication loss after member restart server/auth: users/authentication data not loaded on member startup Sep 9, 2022
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/etcd that referenced this issue Oct 7, 2022
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
tjungblu pushed a commit to tjungblu/etcd that referenced this issue Jul 26, 2023
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants