Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable substitution doesn't work with field names in projection or where clause #8250

Open
mikebin opened this issue Oct 12, 2021 · 12 comments
Assignees

Comments

@mikebin
Copy link

mikebin commented Oct 12, 2021

Describe the bug
Variable substitution fails for field names in the query projection or where clause. According to example in docs here, it should work.

To Reproduce
Version: 6.2.1

ksql> define colName1 = 'LEVEL';
ksql> select ${colName1} from ksql_processing_log emit changes;
Illegal argument at Line: 1, Col: 8. Got: 'LEVEL'
Statement: select ${colName1} from ksql_processing_log emit changes;
ksql> define colName1 = 'LEVEL';
ksql> select * from ksql_processing_log where ${colName1} = 'ERROR';
Illegal argument at Line: 1, Col: 41. Got: 'LEVEL'
Statement: select * from ksql_processing_log where ${colName1} = 'ERROR';

Expected behavior
Query succeeds, substituting variable (colName1) value for field name.

Actual behaviour
See above for error details.

Additional context
Initial feedback from @spena:

Seems the parser is detecting the ${colName1} from the where clause as a variable literal and not a variable identifier, so the parsing fails because level is not a string literal

@sajjadnazrulla
Copy link

I would like to work on this. Can this be assigned to me.

@sajjadnazrulla
Copy link

sajjadnazrulla commented Oct 24, 2021

@spena I would need some help here. I am thinking it is not possible to define separate parser rules for variableLiteral and variableIdentifier at the Grammar level.
Given that how do we validate against unquoted Strings for variable literals, while allowing the same for variable identifiers

@spena
Copy link
Member

spena commented Oct 25, 2021

@sajjadnazrulla I agree. This cannot be fixed at the grammar level (in SqlBase.g4). I found a couple of issues with the grammar if we try to fix it there. My option would be to fix it in the VariableSubstitutor class. First, I suggest to merge the variableLiteral and variableIdentifier labels from the grammar into one, say variableReference. The reason I split it before was to detect variable literals and escape them to avoid sql injection.

I am thinking if we can detect the variable in the new visitVariableReference, then check whether the value is a literal or not by using the same literal rules. If it is a literal, then escape it, otherwise check if it is an identifier. If it is not a literal nor an identifier, then throw an error. I wan to avoid users to use variables with spaces.

@sajjadnazrulla
Copy link

@spena Correct me if I am wrong here. Is the issue more deep rooted? When the grammar parses the statement -

CREATE STREAM stream1 (
  id INT
) WITH (
  kafka_topic = 'stream1',
  value_format = '${format}',
  replicas = ${replicas}
);

${replicas} is lexed as a VARIABLE, but '${format}' is lexed as a string.

Should we remove the VARIABLE and STRING from literal and include a nonVariableString
tableProperty can then be defined as

tableProperty
    : (identifier | STRING) EQ (literal | VARIABLE | QUOTED_VARIABLE)
    ;

QUOTED_VARIABLE
  : '\'' VARIABLE '\''
 ;

@spena
Copy link
Member

spena commented Nov 2, 2021

Right, ${replicas} is lexed as a VARIABLE and '${format}' is lexed as a STRING. VariableSubstitutor handles this case when it finds variables inside a string, i.e. 'This is a variable: ${var1}'.

I don't think we can do a nonVariableString because we support variables inside strings too. They should be escaped. Look at VariableSubstitutor.visitStringLiteral. It unwraps the text (removes single-quotes), looks for variables and sanitized them. These variables will be replaced later by the StringSubstitutor.

@sajjadnazrulla
Copy link

@spena Could you check this WIP :-
https://gist.github.com/sajjadnazrulla/050a7ecfb538199360fe976506cdd7af
Just wanted to get your thoughts on if I am going in the right direction here.

I would still need to handle AstBuilder and QueryAnonimizer to handle TableProperty changes.

QueryAnonimizer is fairly straightforward. AstBuilder changes might be a bit more large.

@spena
Copy link
Member

spena commented Nov 8, 2021

@sajjadnazrulla I think you're going in the right direction. Just make sure to sanitize the variables that are inside strings. I run the VariableSubstitutorTest and it has some failures with the changes.

Regarding the AstBuilder, what changes do you expect to do there? I looked at it and don't know what must be done.

@sajjadnazrulla
Copy link

sajjadnazrulla commented Nov 9, 2021

Thanks @spena .
Since I have removed VARIABLE from the literal in SqlBase.g4, tableProperty is now (identifier | STRING) EQ (literal | VARIABLE)

I would need to handle the tableProperty Value - Check if its a variable or a literal and handle accordingly. This is why the tests are failing. This needs to be done in AstBuilder and QueryAnonymizer also. The changes in QueryAnonymizer will be to change a single method. AstBuilder will be a bit more complex since it returns a Map<String, Literal>

@spena
Copy link
Member

spena commented Nov 9, 2021

I see. I don't think you need to do changes in the AstBuilder. Variables must be substituted before getting into the AstBuilder. See https://github.com/confluentinc/ksql/blob/master/ksqldb-engine/src/main/java/io/confluent/ksql/engine/EngineContext.java#L174

...
final PreparedStatement<?> preparedStatement =
          parser.prepare(substituteVariables(stmt, variablesMap), metaStore);
...

The prepare method is the one that uses the AstBuilder to generate the statement objects. But variables are substituted before calling prepare. I did that because AstBuilder is complex and if a developer adds a new syntax, then the developer must be sure to replace variables too. It was better to do it once.

^ same comment for the QueryAnonymizer. I don't think you need to do changes there either.

@sajjadnazrulla
Copy link

Thanks @spena - This really helps

I modified tableProperty as (identifier | STRING) EQ (literal | identifierValue=identifier) and AstBuilder/QueryAnonymizer works fine.

Just a question on the validations now :
We have tests to say 't1', value_format='AVRO' is a valid value while 'EXTENDED _id_' is invalid.

Should we change the validation to just say no spaces within quotes. The validation for @ also might not be possible since we dont differentiate between literals and identifiers

@spena
Copy link
Member

spena commented Nov 10, 2021

The 't1', value_format='AVRO' is good there. It is just making sure strings are escaped while substituting variables. An error will be thrown later by the AstBuilder. Substitution happens before that, so we are good.

See how the string has extra single-quote when escaped:

Pair.of("CREATE STREAM s1 WITH (kafka_topic=${escapeQuote});",
            "CREATE STREAM s1 WITH (kafka_topic='t1'', value_format=''AVRO');")

The above is just variable substitution happening. AstBuilder will fail later because the value in the variable you used is not valid.

@suhas-satish
Copy link
Member

@sajjadnazrulla , are you still planning to work on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants