Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: quoted identifiers for source names #3695

Merged
merged 3 commits into from
Oct 30, 2019

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Oct 29, 2019

Fixes #1466
Fixes #2171
Fixes #2176

Description

Fixes two of the long-running issues around quoted identifiers and source names:

  • AstBuilder and DataSourceExtractor no longer force upper case even when the source name is quoted
  • ParserUtil enforces that source names match state store requirements (topic naming convention)
  • sourceName is used directly in our grammar to give strong typing in the code

Testing done

  • went through a ton of tickets and manually tried reproducing them, fixed them as they went
  • QTT tests
  • Unit tests
ksql> CREATE STREAM `foo-bar` (id VARCHAR) WITH (kafka_topic='foo', value_format='JSON', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> INSERT INTO `foo-bar` (id) VALUES ('123');
ksql> CREATE STREAM `foo-too` AS SELECT * FROM `foo-bar`;

 Message
------------------------------------------------------------------------------------
 Stream foo-too created and running. Created by query with query ID: CSAS_foo-too_5
------------------------------------------------------------------------------------
ksql> SELECT * FROM `foo-too` EMIT CHANGES;
+------------------------------+------------------------------+------------------------------+
|ROWTIME                       |ROWKEY                        |ID                            |
+------------------------------+------------------------------+------------------------------+
|1572362449405                 |null                          |123                           |
|1572362449405                 |null                          |123                           |
^CQuery terminated
ksql> TERMINATE `CSAS_foo-too_5`;

 Message
-------------------
 Query terminated.
-------------------
ksql> DROP STREAM `foo-too`;

 Message
------------------------------------------------
 Source `foo-too` (topic: foo-too) was dropped.
------------------------------------------------

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from a team as a code owner October 29, 2019 16:00
Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra -- awesome change! LGTM.

}

@Test
public void shouldAllowEscapedTerminateQuery() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a test for INSERT INTO with a quoted source name, either here or as a QTT?

@vcrfxia vcrfxia requested a review from a team October 30, 2019 03:13
@apurvam
Copy link
Contributor

apurvam commented Oct 30, 2019

great to see these patches.. they are fixing long standing issues with KSQL which have proven super frustrating!

My knowledge of this code base is dated, but I'm trying to understand how these changes fix the issues where quoted source names are not handled correctly. Is it primarily the switch to use SourceName consistently everywhere in the code base that fixes these inconsistencies?

@agavra
Copy link
Contributor Author

agavra commented Oct 30, 2019

My knowledge of this code base is dated, but I'm trying to understand how these changes fix the issues where quoted source names are not handled correctly. Is it primarily the switch to use SourceName consistently everywhere in the code base that fixes these inconsistencies?

@apurvam We've just made changes across the board to make sure that the proper names are being piped through the codebase. This specific PR:

  1. fixes quoted source names that were lowercase because previously we forced uppercase even if it was quoted and
  2. uses the correct parsing from TERMINATE query, which previously would use the exact text from the identifier instead of properly un-escape it

The main changes (i.e. around java code generation) were done previously. The SourceName change makes things easier to reason about, but I'm not sure it helped "fix" any issues.

@agavra agavra changed the base branch from master to 5.4.x October 30, 2019 19:30
@agavra agavra merged commit 7d3cf92 into confluentinc:5.4.x Oct 30, 2019
@agavra agavra deleted the restrict_source_name branch October 30, 2019 19:31
@apurvam
Copy link
Contributor

apurvam commented Oct 31, 2019

Got it, thanks @agavra !

@pkgonan
Copy link
Contributor

pkgonan commented Jul 22, 2021

Is there any solution about Describe command ?

[Request]

DESCRIBE 'abc-data.kk_bb.ee_dd.mm_zz' EXTENDED;

[Response]

ksql> DESCRIBE 'abc-data.kk_bb.ee_dd.mm_zz' EXTENDED;
line 1:10: no viable alternative at input 'DESCRIBE 'abc-data.kk_bb.ee_dd.mm_zz''
Statement: DESCRIBE 'abc-data.kk_bb.ee_dd.mm_zz' EXTENDED;
Caused by: line 1:10: no viable alternative at input 'DESCRIBE
	'abc-data.kk_bb.ee_dd.mm_zz''
Caused by: org.antlr.v4.runtime.NoViableAltException

@vcrfxia
Copy link
Contributor

vcrfxia commented Jul 22, 2021

Hi @pkgonan have you tried using backticks, rather than single quotes? As in:

DESCRIBE `abc-data.kk_bb.ee_dd.mm_zz` EXTENDED;

instead of 'abc-data.kk_bb.ee_dd.mm_zz'?

@agavra
Copy link
Contributor Author

agavra commented Jul 22, 2021

I tried repro'ing locally and I noticed a few things:

  1. you can't create an entity with . in it's name, so I"m wondering how you got the query you're trying to describe:
ksql> create stream `b.b.q` as select * from foo;
Illegal argument at Line: 1, Col: 15. Source names may only contain alphanumeric values, '_' or '-'. Got: 'b.b.q'
  1. You can't describe, queries, just tables or streams:
ksql> describe `CSAS_B-B-Q_3`;
Could not find STREAM/TABLE 'CSAS_B-B-Q_3' in the Metastore
  1. I can't repro the issue using only -:
ksql> describe `b-b-q` extended;

Name                 : b-b-q
Type                 : STREAM
Timestamp field      : Not set - using <ROWTIME>
Key format           : JSON
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants