Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CREATE SOURCE TABLE syntax and metadata info #7945

Merged
merged 5 commits into from
Aug 11, 2021

Conversation

spena
Copy link
Member

@spena spena commented Aug 4, 2021

Description

This is part of #7474 which adds support of source tables and streams.

This PR just adds new syntax to create source tables using CREATE SOURCE TABLE statements. It also marks those tables internally for future reference, such as CreateTable, CreateTableCommand and KsqlTable.

The KsqlTable is the object saved in the metastore. This object contains two new variables, readOnly and materialized. The readOnly just specifies the table is read-only; and materialized specifies this table is materialized (or should be materialized). I split the SOURCE keyword into these two variables because they mean different things. Also, we might support source streams in the future which they are read-only objects, but not materialized.

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@spena spena requested review from mjsax, jzaralim and a team August 4, 2021 17:34
Comment on lines 30 to 33
public enum Type {
SOURCE,
NORMAL
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create the Type inside the CreateTable.Type enum because that gives better meaning to the table type we want to use (source or normal). I could have added to the parent class, CreateSource.Type, but that sounds like source is a table or stream type. I'm still not sure whether I put the type in the right place, so feedback is welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas:

  • Put it under CreateSource with a different name, like ownershipType
  • Add the enum to KsqlConstants in ksqldb-common - CreateTableCommand won't have to use strings anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is, what is exact purpose of this "type": read/write, materialized/non-materialized?

In the end, we also want to introduce a SOURCE STREAM, so putting it into the parent might make more sense?

@@ -36,7 +37,8 @@ public CreateTableCommand(
@JsonProperty(value = "topicName", required = true) final String topicName,
@JsonProperty(value = "formats", required = true) final Formats formats,
@JsonProperty(value = "windowInfo") final Optional<WindowInfo> windowInfo,
@JsonProperty(value = "orReplace", defaultValue = "false") final Optional<Boolean> orReplace
@JsonProperty(value = "orReplace", defaultValue = "false") final Optional<Boolean> orReplace,
@JsonProperty(value = "type", defaultValue = "NORMAL") final Optional<String> type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a String instead of the CreateTable.Type because of dependencies. I would have to add ksql-parser deps to the ksql-execution, which I'm not sure about it. So I just converted the type to a string from the CreateSourceFactory which calls this class. I then converted back to CreateTable.Type on the DdlCommandExec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to add ksql-parser deps to the ksql-execution, which I'm not sure about it.

What is the issue? Do we have other ENUM types that we write into the command topic? If yes, we should follow the same strategy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will use a boolean value instead of a Enum. I had another idea with the types to support types like external, temporary, etc in the future, so for compatibility I wanted to do it now, but I don't think that's really necessary. If they're supported, then users could mix the types, so we'll end up adding more parameters. I'll switch with the boolean instead.

@spena spena force-pushed the source_table_readonly_metadata branch from c8be139 to 404f3b3 Compare August 5, 2021 19:05
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of question (mainly for my own education...)

Comment on lines 30 to 33
public enum Type {
SOURCE,
NORMAL
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is, what is exact purpose of this "type": read/write, materialized/non-materialized?

In the end, we also want to introduce a SOURCE STREAM, so putting it into the parent might make more sense?

NORMAL
}

private final Type type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add type as field here (and not in the parent class), we need to update equals() and hashcode() accordingly. (Note sure why there is only a equals() but no hashcode() method...)

@@ -27,15 +27,22 @@

@Immutable
public class CreateTable extends CreateSource implements ExecutableDdlStatement {
public enum Type {
SOURCE,
NORMAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NORMAL is not very descriptive...

@@ -259,7 +259,11 @@ protected Void visitCreateStream(final CreateStream node, final Integer indent)

@Override
protected Void visitCreateTable(final CreateTable node, final Integer indent) {
formatCreate(node, "TABLE");
formatCreate(node,
node.getType() == CreateTable.Type.SOURCE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is SqlFormatter used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by the QueryLogger and other injector classes that re-formats the string statement when changing something in the statement (i.e. injecting a schema fetched from Schema Registry). I don't think we should re-format string statements, but this is an old class used by some legacy code.

@@ -281,13 +281,18 @@ public Node visitCreateTable(final SqlBaseParser.CreateTableContext context) {

final Map<String, Literal> properties = processTableProperties(context.tableProperties());

final CreateTable.Type type = context.SOURCE() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be null ?

Also, why is the method called SOURCE() is all caps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context is what ANTlr generates based on the SqlBase.g4. For instance:

CREATE (OR REPLACE)? (SOURCE)? TABLE (IF NOT EXISTS)? sourceName
                    (tableElements)?
                    (WITH tableProperties)?                                 #createTable

The SOURCE() is all caps because that's how it is defined in the .g4 file. It is not null when the SOURCE keyword is present, null otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not null, what does it return?

ksqlTopicItems
ksqlTopicItems,
false,
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for source tables, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object is not even used anywhere in the test. I'll remove all the unused code. I'll add a test to parse CREATE SOURCE TABLE too.

new CreateTable(SOME_NAME, SOME_ELEMENTS, false, true, OTHER_PROPS, NORMAL)
)
.addEqualityGroup(
new CreateTable(SOME_NAME, SOME_ELEMENTS, false, false, OTHER_PROPS, SOURCE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTHER_PROPS -> SOME_PROPS ?

topic
topic,
false,
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is TemporaryEngine and would we need to add support for source tables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll add it in a follow-up PR. This TemporaryEngine is used by the ListSourceExecutorTest, which tests the DESCRIBE and SHOW commands. There's a JIRA to make sure DESCRIBE correctly handles source streams & tables.

topic
topic,
false,
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR add a test for rejected INSERT INTO for source tables? Should it? Or do we have some other work item for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not add it. Will be in a follow-up.

ksqlTopic
ksqlTopic,
false,
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above?

@spena spena force-pushed the source_table_readonly_metadata branch from 404f3b3 to 99871d7 Compare August 10, 2021 20:56
Copy link
Contributor

@jzaralim jzaralim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, with one comment

formatCreate(node, "TABLE");
formatCreate(node,
node.isSource()
? "SOURCE TABLE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this method as is, and update formatCreate to add the "SOURCE" keyword if the node is a source. Then we won't have to do the same thing for streams.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@spena spena force-pushed the source_table_readonly_metadata branch from 2e6d0bc to 195898f Compare August 11, 2021 13:46
@spena spena force-pushed the source_table_readonly_metadata branch from 195898f to 140f281 Compare August 11, 2021 18:57
@spena spena merged commit 70565f2 into master Aug 11, 2021
@spena spena deleted the source_table_readonly_metadata branch August 11, 2021 21:10
Copy link
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reviews. All LGTM, I'm just wondering if we want to keep the isSource as a boolean or keep it as a enum (of course for not it would only have two possible values, source, or derived).

@@ -142,10 +143,12 @@ public CreateTableCommand createTableCommand(final KsqlStructuredDataOutputNode
outputNode.getKsqlTopic().getKafkaTopicName(),
Formats.from(outputNode.getKsqlTopic()),
outputNode.getKsqlTopic().getKeyFormat().getWindowInfo(),
Optional.of(outputNode.getOrReplace())
Optional.of(outputNode.getOrReplace()),
Optional.of(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about define another isDerived here, rather than using false to indicate is not source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name is tricky. I initially wanted to use a Enum for future purposes like External, Temp, Source, etc types, but I found that those other types can be mixed with source too, like external source, temp source, or something. So I ended up using a boolean. IsDerived makes sense too. I think any of isDerived or isSource could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants