Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new Readyset command to snapshot a new table #1369

Open
altmannmarcelo opened this issue Sep 12, 2024 · 5 comments
Open

Add a new Readyset command to snapshot a new table #1369

altmannmarcelo opened this issue Sep 12, 2024 · 5 comments
Labels
Medium priority Created by Linear-GitHub Sync
Milestone

Comments

@altmannmarcelo
Copy link
Contributor

Description

In case someone is using replication filters to select which tables to snapshot, it's required to bounce the instance in order to add a new replicated table.

We should create a new command to allow for adding a new table to the replicated tables.

Change in user-visible behavior

Requires documentation change

@altmannmarcelo
Copy link
Contributor Author

I don't agree - We should give the users the ability to select which tables they want to snapshot and later add new tables while also allow for snapshot everythig.
On your Idea, if I know that I won't use a table that has TB of data, why should we snapshot it first to later discard?

@davisjc
Copy link
Contributor

davisjc commented Sep 16, 2024

I agree we should avoid snapshotting a table if we'll only then discard it later. I suppose I'd want an easy way to specify the blacklist in advance.

The whole problem seems very similar to the --replication-tables and --replication-tables-ignore arguments, which are all about framing this as either a whitelist or a blacklist.

Ignoring the current particulars of how ReadySet does snapshotting, the blacklisting mentality makes a lot of sense to me (typically presume everything is replicated with a handful of exceptions, which are blacklisted).

Explicitly whitelisting the tables you want replicated could make sense in some situations too, but I don't expect that direction to be as common or desirable. It seems like another operational step customers must do whenever they add a new table to their application.

I'm not sure what I'm proposing yet, so I'm thinking out loud here, but maybe it would make sense to have a way to start ReadySet for the first time (without implicitly also starting replication), examine the upstream tables, define a blacklist that makes sense, and then tell ReadySet to start snapshotting/replicating.

After the initial setup, I'd expect we'd want ReadySet to continue snapshotting and replication by default on subsequent process launches.

@altmannmarcelo
Copy link
Contributor Author

We need to allow for both use cases:

  • I have 1k tables, I want to replicate 10 - use --replicate-tables on the 10 tables
  • I have 1k tables, I want to replicate 990 - use --replicate-tables-ignore on the 10 tables

After Readyset has started, I want to add a new table to either one of those lists - We should have a command to accomplish this. That is what this ticket is about.

@davisjc
Copy link
Contributor

davisjc commented Sep 16, 2024

Those 2 use cases make sense, and for the first one where we're only replicating a handpicked 10, I think running a command to add the table makes sense.

For the second use case, where we're replicating everything but a handpicked 10, I think it would be unfortunate if the user had to manually add this new table (after the first 990 were implicitly chosen for replication).

@altmannmarcelo
Copy link
Contributor Author

For the second use case, where we're replicating everything but a handpicked 10, I think it would be unfortunate if the user had to manually add this new table (after the first 990 were implicitly chosen for replication).

This will automatically be added to Readyset when replicators see the DDL for the new table and it does not match the --replicate-tables-ignore. That is how the filtering works currently.

@altmannmarcelo altmannmarcelo added this to the v.43 milestone Sep 20, 2024
@altmannmarcelo altmannmarcelo added the Medium priority Created by Linear-GitHub Sync label Sep 20, 2024
@altmannmarcelo altmannmarcelo modified the milestones: v.43, v.44 Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Medium priority Created by Linear-GitHub Sync
Projects
None yet
Development

No branches or pull requests

2 participants