Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] PITR: Implement delete_schedule and edit_schedule APIs #8417

Closed
bmatican opened this issue May 12, 2021 · 9 comments
Closed

[docdb] PITR: Implement delete_schedule and edit_schedule APIs #8417

bmatican opened this issue May 12, 2021 · 9 comments
Assignees
Labels
area/docdb YugabyteDB core features good first issue This is a good issue to start contributing! kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented May 12, 2021

Jira Link: DB-2166
For edit, it would be tricky to expose modifying the filter, as we might expand to capture tables not previously captured, for which we wouldn't have old snapshots. Given that, let's just have an ability to change interval and retention.

For delete, we need to confirm that all relevant GC work is done appropriately

  • TS side snapshots are removed
  • hidden tablets are removed, if needed
@bmatican bmatican added the area/docdb YugabyteDB core features label May 12, 2021
spolitov added a commit that referenced this issue Jun 17, 2021
Summary:
This diff adds the ability to delete snapshot schedules.

There is a tricky part in detecting deleted schedules on the TServer side.
Suppose at some point TSTabletManager does not know schedule1.
Since it knows schedules received by heartbeat there are 2 possible options:
1) It is a new schedule that was not yet received by heartbeat.
2) It is deleted schedule.
So to distinguish between those 2 cases it uses the following logic.
Each time the heartbeat response is received and the schedules list is updated, we also increment the `snapshot_schedules_version_` field.
All missing schedules are added to the special map, along with the current value of `snapshot_schedules_version_`.
So when we again find the schedule as missing, we could compare the current `snapshot_schedules_version_` and version that we had when the schedule was first found as missing.
So if the master does not know this schedule also it means that it is an old schedule that was deleted.

But the following could happen:
Heartbeat processed by the master, but response not yet processed by tserver. The new schedule is created and sent to the tablet.
Then the tablet receives the response to this heart, and it would not contain such a schedule.
To avoid interpreting such schedule as deleted we wait that `snapshot_schedules_version_` to be incremented twice, before marking schedule as deleted at tserver.

Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.Delete

Reviewers: skedia, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11868
spolitov added a commit that referenced this issue Jun 20, 2021
Summary:
This diff adds the ability to delete snapshot schedules.

There is a tricky part in detecting deleted schedules on the TServer side.
Suppose at some point TSTabletManager does not know schedule1.
Since it knows schedules received by heartbeat there are 2 possible options:
1) It is a new schedule that was not yet received by heartbeat.
2) It is deleted schedule.
So to distinguish between those 2 cases it uses the following logic.
Each time the heartbeat response is received and the schedules list is updated, we also increment the `snapshot_schedules_version_` field.
All missing schedules are added to the special map, along with the current value of `snapshot_schedules_version_`.
So when we again find the schedule as missing, we could compare the current `snapshot_schedules_version_` and version that we had when the schedule was first found as missing.
So if the master does not know this schedule also it means that it is an old schedule that was deleted.

But the following could happen:
Heartbeat processed by the master, but response not yet processed by tserver. The new schedule is created and sent to the tablet.
Then the tablet receives the response to this heart, and it would not contain such a schedule.
To avoid interpreting such schedule as deleted we wait that `snapshot_schedules_version_` to be incremented twice, before marking schedule as deleted at tserver.

Original commit: D11868 / 7453192

Test Plan:
ybd --gtest_filter YbAdminSnapshotScheduleTest.Delete
Jenkins: rebase: 2.6

Reviewers: skedia, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D12000
@sanketkedia
Copy link
Contributor

sanketkedia commented Sep 27, 2021

This is a good intro task to get familiar with the basics like how to add a client-side interface, define protobufs for communication between client and server, make RPC calls, understand the translation of table rows to rocksdb key-value pairs, perform a simple sys catalog write, etc. delete_schedule has already been implemented. For edit_schedule, typically the following changes are needed:

  1. Define the format of the edit_schedule command and add it in yb-admin. The format should be such that it takes in the id of the snapshot schedule that the user wants to edit, allows for updating the interval and/or the retention. To add the command look at this similar code

  2. Perform client-side validation checks such as the id should exist, the format should be valid, etc, and error out accordingly. For verifying if the snapshot schedule id is valid, you can invoke the ListSnapshotSchedules RPC. See here for how to invoke the ListSnapshotSchedules RPC.

  3. Define and Invoke the EditSnapshotSchedule RPC to send the request to the server (intercepted by the catalog manager of the master). You have to define this RPC as well as the corresponding request and response protobufs. For defining the protobufs look at this proto file

  4. At the server-side, update the corresponding entry in the sys catalog to reflect the new interval and retention. Look at this code that performs a similar write to the sys catalog

  5. Update the in-memory schedules_ structure to reflect the new values. You can use this function to update by loading the new entry from the sys catalog

  6. Write tests that validate that snapshots are getting created once every new interval. Also, old snapshots should get garbage collected every retention period. You can add a test here

cc @bmatican @spolitov

@jordans6
Copy link

Hey, @bmatican, if no one is working on this task currently I'd love to pick it up!

@bmatican
Copy link
Contributor Author

@jordans6 Glad to hear you're excited to be contributing to YB. That sounds great, hopefully @sanketkedia 's message above helps as a starting point!

If you need anymore help, feel free to join our public slack and ask questions on the #contributors channel!

@jordans6
Copy link

Awesome, I'm looking forward to contributing! Would you be able to send me a link to the slack channel? The one in the readme gives me the following error message:
"This link is no longer active. To join this workspace, you’ll need to ask the person who originally invited you for a new link."

@SrivastavaAnubhav
Copy link
Contributor

Hi @jordans6, here's a link for the slack: https://communityinviter.com/apps/yugabyte-db/register
(I'll fix the README link in a bit.)

@jordans6
Copy link

Thanks, I've joined the slack now.

@lingamsandeep lingamsandeep assigned druzac and unassigned jordans6 May 5, 2022
@druzac
Copy link
Contributor

druzac commented May 19, 2022

Is an interval of 0 or retention_duration of 0 valid? create_snapshot_schedule currently accepts both. I suppose the semantics for a 0 interval are as soon as possible, and the semantics of a 0 retention_duration are an infinite TTL?

@sanketkedia
Copy link
Contributor

That's good point @druzac I would say we can just disallow 0 as the interval or retention instead of getting into the confusion and having to explain to users as to what means what?

@sanketkedia
Copy link
Contributor

cc @vkulichenko

@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022
aishwarya24 pushed a commit to aishwarya24/yugabyte-db that referenced this issue Aug 12, 2022
Summary:
Added edit_snapshot_schedule command to yb-admin, along with an RPC in master to edit a snapshot schedule. Moved validation of schedule invariants on the CreateSnapshotSchedule path from the command line tool to the server request handling code. Also added checks for 0 interval and 0 retention snapshot schedules to the CreateSnapshotSchedule path.

Refactored MasterSnapshotCoordinator and SnapshotScheduleState to de-duplicate serialization to docdb.

Test Plan:
YbAdminSnapshotScheduleTest.EditInterval
YbAdminSnapshotScheduleTest.EditRetention
YbAdminSnapshotScheduleTest.EditSnapshotScheduleCheckOptions
YbAdminSnapshotScheduleTest.EditIntervalZero
YbAdminSnapshotScheduleTest.EditRetentionZero
YbAdminSnapshotScheduleTest.EditRepeatedInterval
YbAdminSnapshotScheduleTest.EditRepeatedRetention
YbAdminSnapshotScheduleTest.EditIntervalLargerThanRetention
YbAdminSnapshotScheduleTest.CreateIntervalZero
YbAdminSnapshotScheduleTest.CreateRetentionZero
YbAdminSnapshotScheduleTest.CreateIntervalLargerThanRetention
YbAdminSnapshotScheduleTest.EditIntervalAndRetention

Also played around with a test cluster on the command line.

Reviewers: skedia

Reviewed By: skedia

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17087
@druzac druzac closed this as completed Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features good first issue This is a good issue to start contributing! kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

7 participants