Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make identity store loading and alias merging deterministic #28867

Merged
merged 7 commits into from
Nov 11, 2024

Conversation

banks
Copy link
Member

@banks banks commented Nov 8, 2024

Description

To optimize loading entities into the IdentityStore during unseal we current load the 256 buckets from storage in parallel. We then process them in whatever order they load. This determinism should be fine because each entity should be loaded by ID and so the loading order shouldn't matter.

But in real-life historical (and potentially current) bugs can cause there to be duplicated aliases in storage. We have for many years attempted to cleanup and merge such problematic duplicate aliases on load, however we always merge them in the order they are encountered during loading. Because this is non-deterministic, it means that different aliases can "win" this merge process after different unseals causing unpredictable behaviour. It will often also mean that Enterprise Performance Standbys may end up with a different view of the entities than the active node causing inconsistent results depending on which node responds to a request.

This PR retains the parallel loading optimization but fixes the order of processing loaded buckets to ensure that all nodes will resolve any duplicates identically.

We've reviewed this in the Enterprise PR extensively and performed performance testing that shows that even though there is a theoretical worst-case that might make this new approach slower (say if the first bucket takes a long time to load), in practice it's not measurably different (mainly because the current code doesn't realise ideal parallelism anyway due to contention in other layers of storage).

This should fix issues where duplicates (caused by other bugs) then cause inconsistent responses from different servers.

JIRA: VAULT-31384
Ent PR: https://github.com/hashicorp/vault-enterprise/pull/6776
RFC: https://docs.google.com/document/d/16Tbsngmzg9tuJu1G8s1uSJGDvp-YS5UUUVVboDvuQpc/edit?tab=t.0

TODO only if you're a HashiCorp employee

  • Backport Labels: If this PR is in the ENT repo and needs to be backported, backport
    to N, N-1, and N-2, using the backport/ent/x.x.x+ent labels. If this PR is in the CE repo, you should only backport to N, using the backport/x.x.x label, not the enterprise labels.
    • If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
  • ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
    of a public function, even if that change is in a CE file, double check that
    applying the patch for this PR to the ENT repo and running tests doesn't
    break any tests. Sometimes ENT only tests rely on public functions in CE
    files.
  • Jira: If this change has an associated Jira, it's referenced either
    in the PR description, commit message, or branch name.
  • RFC: If this change has an associated RFC, please link it in the description.
  • ENT PR: If this change has an associated ENT PR, please link it in the
    description. Also, make sure the changelog is in this PR, not in your ENT PR.

@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Nov 8, 2024
@banks banks added this to the 1.19.0-rc milestone Nov 8, 2024
Copy link

github-actions bot commented Nov 8, 2024

CI Results:
All Go tests succeeded! ✅

@banks banks marked this pull request as ready for review November 8, 2024 16:19
Copy link

github-actions bot commented Nov 8, 2024

Build Results:
All builds succeeded! ✅

Copy link
Contributor

@mpalmi mpalmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, Paul! Just FYI: I cherry-picked the squashed commit in enterprise and it builds just fine.

@banks banks merged commit 1aa9a7a into main Nov 11, 2024
91 checks passed
@banks banks deleted the f/id-determinism branch November 11, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants