Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KLIP-20: remove TERMINATE #4126

Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions design-proposals/klip-20_remove_terminate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# KLIP 20 - Remove 'TERMINATE' statements

**Author**: @big-andy-coates |
**Release Target**: TBD |
**Status**: In Discussion |
**Discussion**: TBD

**tl;dr:**
Requiring users to `TERMINATE` a persistent query before dropping a source causes friction in the
user experience. The id of the query(s) that need terminating are not determinist or accessible
through SQL. We propose dropping the source should automatically drop persistent query(s) that write
to it.

*NB*: is most likely dependent on [KLIP-17](klip-17-sql-union.md).

## Motivation and background

See https://github.com/confluentinc/ksql/issues/3967 and the issues it links to for more background.

In short, dropping streams and tables created using `CSAS` or `CTAS` is currently painful, and
practically impossible from a script.

If done manually, the user must run `DESCRIBE <source>` to get the list of queries writing to the
source and then stop each of these in turn, via a `TERMINATE` statement, before finally being able
to drop the stream or table.

From a script, it is not possible to determine the id of the queries that need to be stopped, and
those query ids will change from one run to another in most cases. Hence it is generally not
possible to drop a table or stream in a script.

The problem is compounded by the fact that _mutiple_ persistent queries can be writing into the
source or table. However, [KLIP-17](klip-17-sql-union.md) proposes to correct this, so that there
will be, at-most, a single persistent query writing to any table or stream.

There also seems to be no clear benefit of being able to terminate a query without dropping the
associated stream or table. Especially, given there is no means to restart it.

Consider that `CSAS` and `CTAS` statements are equivalent to `MATERIALIZED VIEW`s used in RDBSs,
(as proposed in [KLIP-19](klip-19-materialize-views.md)). In a traditional RDBS the underlying
process by which the materialized view is kept up to data is not exposed to the user. There is
no 'persistent query' to either list or terminate. Hence we propose we should remove the
problematic `TERMINATE` from our syntax and instead allow users to drop the view, even if there
are queries writing to it. If such queries exist they will simply be stop as part of the `DROP`.

## What is in scope

* Removing `TERMINATE` from our SQL syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of removing TERMINATE. Can't we just drop the requirement that the query be terminated to drop the stream/table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot drop a stream/table if a query is writing into it. Otherwise we will have queries that write into zombie stream/tables!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjafarpour - I think @rodesai means to automatically discover and terminate the queries writing into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not drop TERMINATE ? What use is it? Trad RDBS doesn't expose the persistent query updating materialized views to users. It's an implementation detail that shouldn't be exposed IMHO.

Copy link
Member

@mjsax mjsax Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dropping a MV, should imply that the corresponding query is stopped. However, I think it would still be useful to allow users to terminate a persistent query directly. I also commented on KLIP-17, as I think, keeping INSERT INTO as-is is useful.

This KLIP seems to relate to the "ownership model" that ksqlDB is lacking (ie, it's not well defined yet). If we assume that the output topic is not owned by ksqlDB, but a query write into it, it make total sense to allow users to just terminate the query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This KLIP seems to relate to the "ownership model" that ksqlDB is lacking

I don't see how it is...

If we assume that the output topic is not owned by ksqlDB, but a query write into it, it make total sense to allow users to just terminate the query.

To be able to have a persistent query that writes to the topic there must be a source registered in KSQL that is backed by the topic. Currently this would either be by a CREATE STREAM or CREATE STREAM AS SELECT statement. Neither of these currently embody any sense of ownership.

Dropping the source created by the CS or CSAS doesn't currently delete the topic, and that wouldn't change with this KLIP.

What this KLIP proposes is that if the user wants to stop the query they would just drop the source. The query is an implementation detail.

* Allowing streams and tables to be dropped while there are active queries writing to them.
* Automatically stopping any persistent query that write to a stream or table when that stream or table is dropped.

## What is not in scope

* Allowing streams and tables to be dropped while they are used as a source to downstream views.
This check will remain, as it is needed to ensure downstream views don't break.
In the future, we may support a `DROP TABLE CASCADE` sytnax to help drop a hierarchy of MVs.
* Removal of `show queries` and displaying query ids in `DESCRIBE`. Until such time that we can automatically
recover failed persistent queries we should continue to expose query state to users.
In time, with automated recovery, it may be possible to completely hide this complexity from users.

## Value/Return

Remove `TERMINATE` will make KSQL easier to operate, which will help drive adoption.

## Public APIS

* `TERMINATE` will be removed from our SQL syntax.
* `DROP STREAM|TABLE` will allow users to drop sources that still have active upstream queries writing to them.


## Design

This KLIP could potentially be done independently of [KLIP-17](klip-17-sql-union.md), which restores
the (1 - 0..1) relationship between source and persistent query. However, this is likely more complex
as there are potentially multiple queries writing to a source. This may make recovering from the
command topic more challenging.
big-andy-coates marked this conversation as resolved.
Show resolved Hide resolved

With [KLIP-17](klip-17-sql-union.md) complete there are _at-most_ a single persistent query updating each
source. The engine could track this relationship. When the user attempts to drop the source the engine will
check if the source is used in any downstream queries. If it is, then the drop will be denied, as it does today.
The engine will also check for upstream queries writing to the source. If found, instead of failing,
the query will be terminated.

Any `TERMINATE` messages found in the command topic will be ignored, (and a warning logged). This may
result in queries running after a restart that were previously stopped.

The CLI will be updated to detect `TERMINATE` statements and inform the user they are no longer supported,
and what to do instead, linking to more documentation. This can be done through a general pattern that can
be used for other removed statement types.

## Test plan

Suitable tests will be added to ensure sources created via `CTAS` and `CSAS` statements can be dropped.

## Documentation Updates

KSQL documentation updates should simply be a case of removing any reference to `TERMINATE`. Updating the wider
each system may be a little more involved. However, this is mitigated by the CLI enhancement to return a rich
error message if `TERMINATE` is used, rather than the default parser error.

# Compatibility Implications

Removing `TERMINATE` from our syntax is a breaking change.

We could choose to keep `TERMINATE` in our syntax for a period of time and have both the CLI and server
print/log a message explaining the statement is not longer needed and will be ignored. However, this two
stage approach is more work and ROI of the additional work may be low.

The main compatibility implication is around existing `TERMINATE` commands in the command topic.
It should be fine to log a warning and ignore such statements, (though we'll need special code to
detect them as they won't parse). It should be safe to ignore as any `DROP` statement that comes
after will still succeed. Ignoring them should not result in a corrupted metastore. The worst that
can happen is a query that was previously terminated, but whose table was not dropped, would be
restarted on a restart of the server. This seems acceptable.

## Performance Implications

None.

## Security Implications

None.