-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip topic creation/deletion for dropped streams/tables when replaying command topic on restart #2329
Comments
Adding some thoughts after investigating a solution: the key challenge here is that we need to still execute all of the commands so that the metadata store is properly updated and in-sync with any other server. I can think of two approaches:
My preference is toward option 2, but it requires some non-trivial refactoring. @rodesai - any thoughts? |
Things get a little more complicated... for approach 2, we need to make it so that each server records committed offsets. This can't be done (in kafka) without registering a consumer group with the Kafka brokers (kafka doesn't track offsets per consumer, it tracks them per Topic/Partition - see KafkaConsumer maybeThrowInvalidGropuIdException). Now it gets extra tricky because each kafka server must have its own consumer group, otherwise it won't receive all events, but the consumer group needs to be consistent across restarts. NOTE: by default, the configuration is to have the consumer start at LATEST offset that exists on the command topic, it does not actually store any offsets locally. |
Here's another broken recovery scenario:
When recovering, the statement CREATE STREAM FOO will fail to validate because the underlying topic is missing. |
Wanted to dump my current thoughts on this category of issues. Our basic recovery strategy is to start with a well-known initial "zero-state", and then replay the command topic to rebuild our state. The issue here is that our state also includes an external system - kafka. This has 2 problems:
To fix the ownership problem, I think we should slightly change our contract with the user. KSQL should get a "namespace" (prefix) within a Kafka cluster. KSQL will only ever automatically create or delete topics within that namespace. If you want to use KSQL with a topic that is outside that namespace, it must already exist. To make the system more usable, we could support sink topic creation for external (not in our namespace) topics from KsqlResource before issuing a statement to the command topic. But that's only a convenience, and should never be done while running the command topic. To deal with unpredictability, the outcome of evaluating a KSQL statement should not depend on the current set of topics in Kafka. We can look at this from the pov of externally-created and internally-created topics (and assuming the ownership problem is fixed as described above). For externally-generated topics, we can spuriously fail commands on recovery due to validation errors (as in the above example). Instead of validating that topics exist, when running the command topic we should just assume that those topics already exist and skip validation. Note that it's still fine, and important, to validate before submitting commands to the command topic. But we should not do this validation when running because the current state of kafka may be different from a previous run. If the assumption is wrong, any queries (streams jobs) that use those topics will fail, but that should be considered a user error, and should be reported to the user in the query status. It shouldn't cause the outcome of the statement to fail. For internally-generated topics, we may inadvertently delete a sink topic if a later command creates a sink topic with the same name. To solve this we can try to isolate commands from each other by creating a 1:1 mapping between commands and topics. So internally generated sink topic names should include the query id. |
@rodesai thanks for providing valuable insight! I agree that any external state manipulation should be done prior to queuing a command on the command topic. This would also solve any race conditions across multiple KSQL servers. One thing that I would add about the externality problem, is that we need to make sure that query id is a function of a command message alone (e.g. Kafka offset or hash of command) - today it is a function of what commands were executed before it. In the short term, an offline discussion with @big-andy-coates leads me to believe that injecting a special |
Currently when replaying the command topic at startup, ksql will execute all the commands (but skip starting queries). This includes execution of kafka topic creates and deletes. This can cause undesired side effects for users. For example, in the case:
If the user has already deleted topic FOO, upon restarting ksql will create the topic.
Or in the case:
If the user has created another topic FOO, ksql will delete the topic (need to verify this case experimentally).
To rectify this in the short term, we should skip deletes during replay and avoid creating topics until the full command topic is replayed, and then only create the topics required for the current set of queries.
Note that this is an incomplete solution to this general category of problem:
The text was updated successfully, but these errors were encountered: