Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use coherent naming scheme for generated java code #3417

Merged
merged 1 commit into from
Sep 28, 2019

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Sep 25, 2019

Description

This patch ensures that all generated code uses valid Java identifiers instead of arbitrary column names by generating varX where X is the column index for the parameter being passed in. Note that the only "real" changes in this PR are in SqlToJavaVisitor and CodeGenRunner (both of which are single line changes).

As part of this change I also:

  • Column also has an index
  • LogicalSchema.Builder tracks Column Indices
  • LogicalSchema uses the builder to do internal copies to maintain unique indices (when copying metadata/keys into value)

Testing done

  • mvn clean install

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

vcrfxia
vcrfxia previously approved these changes Sep 26, 2019
Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra -- LGTM with a question inline.

final List<Column> suppliedKey = keyBuilder.build();

final List<Column> key = suppliedKey.isEmpty()
? IMPLICIT_KEY_SCHEMA
: suppliedKey;

return new LogicalSchema(
METADATA_SCHEMA,
metadata.isEmpty() ? METADATA_SCHEMA : metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely understanding this change. Is the idea that external Builder instances will continue to use the default metadata schema, but that internal Builder instances will copy the metadata around?

The metadata schema is not updated anywhere, is it? So this is equivalent to always returning the default metadata schema, except "safer" in that we copy when we should. Is that correct?

Copy link
Contributor Author

@agavra agavra Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the comment would help here - basically the metadata is actually updated when we add/remove aliases :( I'll be trying to remove that in the long run.

     // used only for internal copy (keeps the alias from the original schema)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got'cha. And all of that is still managed by this class, which is why the metadataColumns() method can be private. Thanks for clarifying!

@vcrfxia vcrfxia requested a review from a team September 26, 2019 22:16
@purplefox
Copy link
Contributor

Hey @agavra , could you give an example of some generated code that's changed because of these changes?

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra

I've got no issue with the core change to the code generation - this is a great improvement IMHO.

However, I do have an objection to the change to Column. It smells! See comment inline...

* Unknown index, used for key columns that have not
* been resolved to a value column.
*/
public static final int UNKNOWN_IDX = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just have OptionalInt index, rather than some magic special number?

private final ColumnRef ref;
private final SqlType type;
private final int index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hummm.... index is not included in equals and hashCode.

I'm guessing you're doing this as otherwise lots of tests and other things start to fail, right?

These feels wrong and will likely cause bugs or at least tests that don't test what people think they do.

It looks like the functionality in this PR can be implemented without the change to this class.

I suggest one of the following approaches:

  1. Don't make this change to this class, and just continue to use the existing valueColumnIndex impl in LogicalSchema for now.
  2. Make this change, but implement hashCode and equals properly. As part of this we should avoid building instances of Column to pass to things where we don't care about the index. Escentially, if a method only cares about the name and type, then that's not a Column with this change. e.g. the following is an example of existing code that would become a misuse of Column. The doFindKeyColumn method should take the name and type as separate params:
return doFindKeyColumn(selectExpressions, Column.of(keyField.name(), keyField.type()))
          .map(field -> LegacyField.of(ColumnName.of(field.fullName()), field.type()));
  1. Leave column as-is, but change LogicalSchema to track the indexes better.
  2. Build the field -> index mapping just in the java generation code.
  3. Something else.

TBH, there's a lot of changes going in around the logical schema, columns, and copying of rowkey and rowtime into the value schema. Once the dust settles on all this good refactoring it might well be that the only place that cares about indexes is the java code builder. Hence, I'd urge you to just go with option 1 above for now.

@@ -51,7 +51,8 @@ public void shouldImplementEqualsProperly() {
new EqualsTester()
.addEqualityGroup(
Column.of(SOME_NAME, SqlTypes.INTEGER),
Column.of(SOME_NAME, SqlTypes.INTEGER)
Column.of(SOME_NAME, SqlTypes.INTEGER),
Column.of(SOME_NAME, SqlTypes.INTEGER).withIndex(1) // index is not considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug! ;)

@@ -180,7 +180,7 @@ private void addParameter(final Column schemaColumn) {
parameters.add(new ParameterType(
SQL_TO_JAVA_TYPE_CONVERTER.toJavaType(schemaColumn.type()),
schemaColumn.fullName(),
schemaColumn.fullName().replace(".", "_"),
CodeGenUtils.paramName(schemaColumn),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

}

static String paramName(final Column column) {
KsqlPreconditions.checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws an KsqlException which is reserved for user errors. This is not a user error...

@big-andy-coates
Copy link
Contributor

Hey @agavra , could you give an example of some generated code that's changed because of these changes?

@purplefox the sorts of changes can be see in the SqlToJavaVisitorTest.java. However, in short, KSQL works by generating text containing java code and the compiling it. SqlToJavaVisitorTest is the class that builds the java code. Previously, it would use the column names in the java code. This meant our column names needed to be valid Java identifiers. With @almog's change we replace the use of column names with indexed variables, so rather than myFirstColumn you get var0 in the Java code.

@agavra agavra dismissed vcrfxia’s stale review September 27, 2019 17:36

making major change after @big-andy-coates' suggestion

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that's a lot smaller and more contained!

Thanks @agavra, LGTM

@agavra agavra merged commit 06a2a57 into confluentinc:master Sep 28, 2019
@agavra agavra deleted the java_identifiers branch September 28, 2019 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants