Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register privileges in Kibana Platform Security plugin and remove legacy getUser API. #65472

Merged
merged 13 commits into from
Jun 5, 2020

Conversation

azasypkin
Copy link
Member

@azasypkin azasypkin commented May 6, 2020

Summary

In this PR we're:

  • Moving privileges registration to the Kibana Platform Security plugin from the legacy one relying on newly introduced Core's Status service.
  • Completely removing remaining public API exposed by the legacy Security plugin.
  • Registering Kibana Platform Security plugin with SASS linter.
  • Removing remaining dependency on injected vars (xpack.spaces.enabled)

Note to reviewers: it's a bit easier to review commit by commit (git move + changes).

If you received a review ping: main X-Pack API integration test config was migrated to TS and hence required fix in all references.

Blocked by: #65461

@azasypkin azasypkin added chore blocked Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! release_note:skip Skip the PR/issue when compiling release notes Feature:Security/Authorization Platform Security - Authorization Feature:NP Migration v7.9.0 labels May 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@azasypkin azasypkin force-pushed the issue-xxx-np-migration branch 2 times, most recently from d4c6f7b to e44e540 Compare May 13, 2020 09:14
fetchedSpaces => setSpaces({ enabled: true, list: fetchedSpaces }),
(err: IHttpFetchError) => {
// Spaces plugin can be disabled and hence this endpoint can be unavailable.
if (err.response?.status === 404) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I moved this change to a separate commit as I want to hear your thoughts. It's "the least evil" solution I could come up with. Alternatively we can introduce an endpoint that will return isSpacesEnabled only based on the "Spaces Service" that Spaces registers with Security during setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is tolerable. I think it'd make sense (in a followup) for us to export a useSpaces hook from the Spaces plugin itself, so that we could at least have the Spaces plugin own this determination.

The disabled check aside, I could see other plugins benefiting from this hook too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree

@azasypkin azasypkin marked this pull request as ready for review May 18, 2020 14:01
@azasypkin azasypkin requested review from a team as code owners May 18, 2020 14:01
validateFeaturePrivileges(allFeatures);
validateReservedPrivileges(allFeatures);

await registerPrivilegesWithCluster(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a gap in error handling here. The legacy implementation had retry logic (with backoff) in the event of a transient failure, which I think we want here as well. If await registerPrivilegesWithCluster() throws an error, but neither this.status.core$ nor this.license.features$ emit a new value, then Kibana will be stuck in an unusable state from an authorization perspective. We will either have to wait for one of those two observables to change (if that ever happens), or Kibana will have to be restarted in order to reattempt privilege registration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates, but it seems not be the case since you're bringing this up 🙂 I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates

I think it will definitely catch some of these, but I could also imagine a race condition where core thinks everything is OK, but then we get network blip where we lose our connection for a very short period of time. core might not catch this blip if it wasn't doing anything with ES at the time.

It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)

I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...

Yeah, that's definitely a tradeoff. That's probably one of the reasons we originally had this as a reusable utility, so that each plugin wouldn't have to implement its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)

The thing that concerns me most is that Core determines the first time when we try to perform an action, but if that action fails, then responsibility shifts to a consumer that now decides when and if it's appropriate to do a re-try. It seems we don't have a way to force Core to check whether ES is available or not at the time we do a re-try. If ES is struggling for whatever reason we (every plugin that needs to do a re-try) don't want to make things worse via bombarding it with the requests. Ideally Core should be a single source of truth that decides whether it's appropriate to do a re-try or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. @elastic/kibana-platform do you have any thoughts around this?

To summarize, we are waiting for Core to tell us when ES is ready via the exposed CoreStatus service. Once all prerequisites are met, we attempt to make a couple of calls to ES. If these checks fail though, we would like to be able to retry them at some point in the future, but it's possible that everything appears "fine" from the perspective of CoreStatus, so we will never get an updated value from the Observable.

Having each plugin implement its own retry logic is feeling a bit complex and error prone. Aleh also makes a good point that plugins shouldn't be exacerbating the problem by retrying whenever they feel it is appropriate, but instead should wait on a signal from Core. The problem is that we don't have a way to alert Core that something went wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@legrego I don't remember we discussed this case. Added a comment to the status service meta-issue #41983 (comment)

Copy link
Contributor

@cchaos cchaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SASS change is fine. Thanks for adding the Sass linting!

let retryTimeout: NodeJS.Timeout;

// Register cluster privileges once Elasticsearch is available and Security plugin is enabled.
this.statusSubscription = combineLatest([
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: presumably we'll get rid of this code in 7.10, so I gave up on learning how to properly leverage retryWhen here and went with plain setTimeout approach 🙈

@azasypkin
Copy link
Member Author

@legrego should be ready for another pass, thanks!

@azasypkin azasypkin requested review from a team as code owners June 4, 2020 10:57
@azasypkin azasypkin requested a review from a team June 4, 2020 10:57
@azasypkin azasypkin requested a review from a team as a code owner June 4, 2020 10:57
@azasypkin azasypkin requested a review from a team June 4, 2020 10:57
@azasypkin azasypkin requested review from a team as code owners June 4, 2020 10:57
@botelastic botelastic bot added Team:APM All issues that need APM UI Team support Team:Fleet Team label for Observability Data Collection Fleet team Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability labels Jun 4, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alerting changes LGTM

Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apm changes lgtm

Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I'm running a flaky test suite for the api_integration ciGroup to make sure this is stable. I worry that we might not be immediately notified of the license change, or that the privilege registration will take longer than the subsequent API request to get the next set of privileges

https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/512/

Once the flaky runner completes, this LGTM! Thanks for the edits!

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingest manager README changes LGTM

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@azasypkin
Copy link
Member Author

note I'm running a flaky test suite for the api_integration ciGroup to make sure this is stable. I worry that we might not be immediately notified of the license change, or that the privilege registration will take longer than the subsequent API request to get the next set of privileges
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/512/

Thanks! All 42 runs of flaky test suite passed, but I'll keep eye on this. In case that start failing I'll just convert final expect into busy-waiting loop.

@azasypkin azasypkin merged commit c6e2fed into elastic:master Jun 5, 2020
@azasypkin azasypkin deleted the issue-xxx-np-migration branch June 5, 2020 06:12
azasypkin added a commit to azasypkin/kibana that referenced this pull request Jun 5, 2020
…acy `getUser` API. (elastic#65472)

# Conflicts:
#	x-pack/README.md
#	x-pack/plugins/uptime/README.md
#	x-pack/scripts/functional_tests.js
@azasypkin
Copy link
Member Author

7.x/7.9.0: 2890d5b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported chore Feature:NP Migration Feature:Security/Authorization Platform Security - Authorization release_note:skip Skip the PR/issue when compiling release notes Team:APM All issues that need APM UI Team support Team:Fleet Team label for Observability Data Collection Fleet team Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants