Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd snapshots retention when node name changes #8099

Merged
merged 4 commits into from
Aug 3, 2023

Conversation

vitorsavian
Copy link
Member

Proposed Changes

  • Delete the oldest snapshot files inside the node and s3 if the snapshot count is greater than the retention, regardless of the node name.

Types of Changes

  • Fix

Verification

Need to create a cluster with etcd snapshots enabled and s3

k3s server --cluster-init --etcd-snapshot-schedule-cron "*/1 * * * *" --etcd-snapshot-retention 2 --etcd-s3 --etcd-s3-access-key --etcd-s3-secret-key --etcd-s3-bucket

Then you can see in the /var/lib/rancher/k3s/server/db/snapshots that will maintain the retention

ls /var/lib/rancher/server/db/snapshots

If you are using AWS, you can just reboot the machine to have another name, and just restarted the cluster, it will still delete the snapshot inside /var/lib/rancher/server/db/snapshots and maintain the retention

Testing

Linked Issues

User-Facing Change

NONE

Further Comments

@vitorsavian vitorsavian requested a review from a team as a code owner August 2, 2023 14:54
Copy link
Member

@dereknola dereknola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 E2E passes with this PR:

Ran 8 of 8 Specs in 115.931 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: Test_E2ES3 (115.93s)
PASS
ok  	command-line-arguments	115.934s

@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Patch coverage: 66.66% and project coverage change: +7.05% 🎉

Comparison is base (8c38d11) 44.45% compared to head (c9103b2) 51.51%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8099      +/-   ##
==========================================
+ Coverage   44.45%   51.51%   +7.05%     
==========================================
  Files         140      143       +3     
  Lines       14508    14568      +60     
==========================================
+ Hits         6449     7504    +1055     
+ Misses       6963     5873    -1090     
- Partials     1096     1191      +95     
Flag Coverage Δ
e2etests 49.31% <0.00%> (?)
inttests 44.39% <66.66%> (-0.06%) ⬇️
unittests 20.05% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/etcd/s3.go 0.00% <0.00%> (ø)
pkg/etcd/etcd.go 45.52% <100.00%> (+6.32%) ⬆️

... and 54 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vitorsavian vitorsavian merged commit ca7aeed into k3s-io:master Aug 3, 2023
6 checks passed
vitorsavian added a commit to vitorsavian/k3s that referenced this pull request Aug 3, 2023
Fixed the etcd retention to delete orphaned snapshots

Signed-off-by: Vitor <[email protected]>
@brandond
Copy link
Member

brandond commented Aug 3, 2023

I think we should go ahead with this current change as-is, and document and/or release-note that multiple clusters should not store snapshots in the same bucket+prefix. Doing so already creates issues, as snapshots cross-populate across clusters if they are stored in the same location. With this change, clusters will both enforce retention limits on each others files if they happen to be in the same place, and this is working as designed.

In the future, as part of work on #8064, we will store additional metadata alongside the snapshot files in order to allow RKE2 to determine whether or not the snapshot file is owned by the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants