-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KLIP-20: remove TERMINATE #4126
Closed
big-andy-coates
wants to merge
3
commits into
confluentinc:master
from
big-andy-coates:KLIP-20-Remove-Terminate
Closed
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# KLIP 20 - Remove 'TERMINATE' statements | ||
|
||
**Author**: @big-andy-coates | | ||
**Release Target**: TBD | | ||
**Status**: In Discussion | | ||
**Discussion**: TBD | ||
|
||
**tl;dr:** | ||
Requiring users to `TERMINATE` a persistent query before dropping a source causes friction in the | ||
user experience. The id of the query(s) that need terminating are not determinist or accessible | ||
through SQL. We propose dropping the source should automatically drop persistent query(s) that write | ||
to it. | ||
|
||
*NB*: is most likely dependent on [KLIP-17](klip-17-sql-union.md). | ||
|
||
## Motivation and background | ||
|
||
See https://github.com/confluentinc/ksql/issues/3967 and the issues it links to for more background. | ||
|
||
In short, dropping streams and tables created using `CSAS` or `CTAS` is currently painful, and | ||
practically impossible from a script. | ||
|
||
If done manually, the user must run `DESCRIBE <source>` to get the list of queries writing to the | ||
source and then stop each of these in turn, via a `TERMINATE` statement, before finally being able | ||
to drop the stream or table. | ||
|
||
From a script, it is not possible to determine the id of the queries that need to be stopped, and | ||
those query ids will change from one run to another in most cases. Hence it is generally not | ||
possible to drop a table or stream in a script. | ||
|
||
The problem is compounded by the fact that _mutiple_ persistent queries can be writing into the | ||
source or table. However, [KLIP-17](klip-17-sql-union.md) proposes to correct this, so that there | ||
will be, at-most, a single persistent query writing to any table or stream. | ||
|
||
There also seems to be no clear benefit of being able to terminate a query without dropping the | ||
associated stream or table. Especially, given there is no means to restart it. | ||
|
||
Consider that `CSAS` and `CTAS` statements are equivalent to `MATERIALIZED VIEW`s used in RDBSs, | ||
(as proposed in [KLIP-19](klip-19-materialize-views.md)). In a traditional RDBS the underlying | ||
process by which the materialized view is kept up to data is not exposed to the user. There is | ||
no 'persistent query' to either list or terminate. Hence we propose we should remove the | ||
problematic `TERMINATE` from our syntax and instead allow users to drop the view, even if there | ||
are queries writing to it. If such queries exist they will simply be stop as part of the `DROP`. | ||
|
||
## What is in scope | ||
|
||
* Removing `TERMINATE` from our SQL syntax. | ||
* Allowing streams and tables to be dropped while there are active queries writing to them. | ||
* Automatically stopping any persistent query that write to a stream or table when that stream or table is dropped. | ||
|
||
## What is not in scope | ||
|
||
* Allowing streams and tables to be dropped while they are used as a source to downstream views. | ||
This check will remain, as it is needed to ensure downstream views don't break. | ||
In the future, we may support a `DROP TABLE CASCADE` sytnax to help drop a hierarchy of MVs. | ||
* Removal of `show queries` and displaying query ids in `DESCRIBE`. Until such time that we can automatically | ||
recover failed persistent queries we should continue to expose query state to users. | ||
In time, with automated recovery, it may be possible to completely hide this complexity from users. | ||
|
||
## Value/Return | ||
|
||
Remove `TERMINATE` will make KSQL easier to operate, which will help drive adoption. | ||
|
||
## Public APIS | ||
|
||
* `TERMINATE` will be removed from our SQL syntax. | ||
* `DROP STREAM|TABLE` will allow users to drop sources that still have active upstream queries writing to them. | ||
|
||
|
||
## Design | ||
|
||
This KLIP could potentially be done independently of [KLIP-17](klip-17-sql-union.md), which restores | ||
the (1 - 0..1) relationship between source and persistent query. However, this is likely more complex | ||
as there are potentially multiple queries writing to a source. This may make recovering from the | ||
command topic more challenging. | ||
big-andy-coates marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
With [KLIP-17](klip-17-sql-union.md) complete there are _at-most_ a single persistent query updating each | ||
source. The engine could track this relationship. When the user attempts to drop the source the engine will | ||
check if the source is used in any downstream queries. If it is, then the drop will be denied, as it does today. | ||
The engine will also check for upstream queries writing to the source. If found, instead of failing, | ||
the query will be terminated. | ||
|
||
Any `TERMINATE` messages found in the command topic will be ignored, (and a warning logged). This may | ||
result in queries running after a restart that were previously stopped. | ||
|
||
The CLI will be updated to detect `TERMINATE` statements and inform the user they are no longer supported, | ||
and what to do instead, linking to more documentation. This can be done through a general pattern that can | ||
be used for other removed statement types. | ||
|
||
## Test plan | ||
|
||
Suitable tests will be added to ensure sources created via `CTAS` and `CSAS` statements can be dropped. | ||
|
||
## Documentation Updates | ||
|
||
KSQL documentation updates should simply be a case of removing any reference to `TERMINATE`. Updating the wider | ||
each system may be a little more involved. However, this is mitigated by the CLI enhancement to return a rich | ||
error message if `TERMINATE` is used, rather than the default parser error. | ||
|
||
# Compatibility Implications | ||
|
||
Removing `TERMINATE` from our syntax is a breaking change. | ||
|
||
We could choose to keep `TERMINATE` in our syntax for a period of time and have both the CLI and server | ||
print/log a message explaining the statement is not longer needed and will be ignored. However, this two | ||
stage approach is more work and ROI of the additional work may be low. | ||
|
||
The main compatibility implication is around existing `TERMINATE` commands in the command topic. | ||
It should be fine to log a warning and ignore such statements, (though we'll need special code to | ||
detect them as they won't parse). It should be safe to ignore as any `DROP` statement that comes | ||
after will still succeed. Ignoring them should not result in a corrupted metastore. The worst that | ||
can happen is a query that was previously terminated, but whose table was not dropped, would be | ||
restarted on a restart of the server. This seems acceptable. | ||
|
||
## Performance Implications | ||
|
||
None. | ||
|
||
## Security Implications | ||
|
||
None. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of removing
TERMINATE
. Can't we just drop the requirement that the query be terminated to drop the stream/table?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot drop a stream/table if a query is writing into it. Otherwise we will have queries that write into zombie stream/tables!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hjafarpour - I think @rodesai means to automatically discover and terminate the queries writing into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not drop
TERMINATE
? What use is it? Trad RDBS doesn't expose the persistent query updating materialized views to users. It's an implementation detail that shouldn't be exposed IMHO.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that dropping a MV, should imply that the corresponding query is stopped. However, I think it would still be useful to allow users to terminate a persistent query directly. I also commented on KLIP-17, as I think, keeping
INSERT INTO
as-is is useful.This KLIP seems to relate to the "ownership model" that ksqlDB is lacking (ie, it's not well defined yet). If we assume that the output topic is not owned by ksqlDB, but a query write into it, it make total sense to allow users to just terminate the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it is...
To be able to have a persistent query that writes to the topic there must be a source registered in KSQL that is backed by the topic. Currently this would either be by a
CREATE STREAM
orCREATE STREAM AS SELECT
statement. Neither of these currently embody any sense of ownership.Dropping the source created by the
CS
orCSAS
doesn't currently delete the topic, and that wouldn't change with this KLIP.What this KLIP proposes is that if the user wants to stop the query they would just drop the source. The query is an implementation detail.