-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authorization error in CCR results in node fatal error #84006
Comments
Pinging @elastic/es-security (Team:Security) |
Pinging @elastic/es-distributed (Team:Distributed) |
I looks like this is an intentional assertion failure for CCR. Line 541 in a2bc485
We explicitly I haven't investigated far enough to know whether this failure is real, and whether it's caused by a change in security or CCR, or something else. But it does look like we're very intentionally killing the node if CCR hits security failures during testing. |
The unauthorized action is But it's weird that no one ever complained about. It is also interesting that we don't have a test failure till now? |
I suspect we have but the actual assertion error is buried in the node logs. That's one fundamental flaw with test time assertions in production code, which is that they don't surface in test errors. If the node blows up due to an assertion error you'll just get random test failures with difficulty talking to the test cluster. We don't otherwise store these node logs in a searchable way so it's basically impossible to know how many tests have failed for this reason. |
Today the `ShardFollowTasksExecutor` enters system context before renewing a retention lease, but then makes the remote call using a client which replaces the thread context with a non-system one again. This commit removes this no-op code to clarify the security model in this area. Relates elastic#61308, elastic#84006, elastic#84156
Closing as a duplicate of #84156 which has more context. |
Today the `ShardFollowTasksExecutor` enters system context before renewing a retention lease, but then makes the remote call using a client which replaces the thread context with a non-system one again. This commit removes this no-op code to clarify the security model in this area. Relates #61308, #84006, #84156
I might be misinterpreting what happened here but I encountered a build where a security exception resulted in a "fatal error", which as I understand it is unrecoverable and causes the node to shutdown. Of course, this mean other tests failed due to connection exceptions as the test node had exited. Not exactly sure the nature of the error, but I wouldn't think an auth error would result in the node completely exiting.
Here's the relevant node logs:
The text was updated successfully, but these errors were encountered: