Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add view interfaces #4925

Merged
merged 1 commit into from
Dec 5, 2022
Merged

API: Add view interfaces #4925

merged 1 commit into from
Dec 5, 2022

Conversation

jzhuge
Copy link
Member

@jzhuge jzhuge commented Jun 1, 2022

Co-authored-by: [email protected]

@github-actions github-actions bot added the API label Jun 1, 2022
@jzhuge
Copy link
Member Author

jzhuge commented Jun 1, 2022

Split from #4567

@nastra
Copy link
Contributor

nastra commented Jun 1, 2022

@jzhuge I think you wanted to link to #4657 here :)

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @jzhuge for the contribution! @jackye1995 @nastra let us know if you have any further comments.

cc: @rdblue @danielcweeks

@jzhuge
Copy link
Member Author

jzhuge commented Jun 20, 2022

Thanks @amogh-jahagirdar.

BTW, I am also ok if we decide to remove UpdateViewProperties and just use the existing UpdateProperties.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jackye1995
Copy link
Contributor

@rdblue do you have any additional comment for adding these view APIs?

@jzhuge
Copy link
Member Author

jzhuge commented Jun 30, 2022

If no more comment, is it possible to get it merged so that I can use it in the Core PR?

@jzhuge
Copy link
Member Author

jzhuge commented Jul 19, 2022

Hi @danielcweeks, could you please take a look and merge if it looks ok?

@danielcweeks danielcweeks self-requested a review July 22, 2022 18:28
@jzhuge
Copy link
Member Author

jzhuge commented Aug 15, 2022

Hi @danielcweeks, have you got a chance to take a look?

long timestampMillis();

/**
* Returns the version summary such as the name and genie-id of the operation that created that version of the view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to remove references to genie.

@danielcweeks
Copy link
Contributor

@jzhuge or @anjalinorwood I feel like I'm missing something in terms of where we actually access the view content / text? I would think that would live in ViewRepresentation or ViewVersion?

@amogh-jahagirdar
Copy link
Contributor

@danielcweeks I'll let @jzhuge @anjalinorwood confirm but the model at least in my mind is that there is an implementation of ViewRepresentation; for the SQL view representation, this would be SqlViewRepresentation which would expose a sql() method for surfacing the literal text. Down the line there would be other view representation types such as for substrait which would expose the view representation in their own way (serialized plan node for example).

@jzhuge
Copy link
Member Author

jzhuge commented Aug 30, 2022

Merged Amogh's PR, rebased, and applied spotless.

@jzhuge jzhuge force-pushed the view-api branch 2 times, most recently from 65b7b6e to e3de593 Compare November 6, 2022 21:52
String dialect();

/** The default catalog when the view is created. */
String defaultCatalog();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a getter. wondering why default...? should it just be catalog()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a Spark/Presto SQL session, you can run USE to set default catalog and namespace, thus the "default".

@jzhuge
Copy link
Member Author

jzhuge commented Nov 7, 2022

Created #6134 to add the missing field query-column-names to SQL view representation in the view spec.

@wmoustafa
Copy link
Contributor

  • No need for something like View.updateRepresentions(), as buildView() + replace() should be enough

Can one replace the current version or an old version by adding a new dialect? Does it result in a new version?

api/src/main/java/org/apache/iceberg/view/ViewVersion.java Outdated Show resolved Hide resolved
* @param query view query
* @return this for method chaining
*/
ViewBuilder withQuery(String query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original question was that this builder returns a View, and in this discussion, you mentioned that withQuery() is the method responsible for the fact that it should return SQLViewRepresentation. However, I think this method still does not enforce it in this iteration.

Comment on lines +31 to +33
/** The view query SQL text. */
String query();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would not this method be also common with other representations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the SQL text of the view. Other view representations don't necessarily have a SQL query. I think this is reasonable.

Comment on lines +34 to +36
/** The view query SQL dialect. */
String dialect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this needs to be an Enum and somehow be merged with Type. @rdblue, @danielcweeks, @amogh-jahagirdar, thoughts? Further, it is better for the spec to take a position on supported dialects, or not take a position at all, but in this case SQL should not be a special type. Right now the position sounds a bit fuzzy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that we want to define supported dialects. We don't want engine writers to need to come to the Iceberg community and get a new symbol in an enum before storing a view. I don't see much benefit to doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to do either: not take a position at all or take a well defined position. There could be a few scenarios where the intended meaning is not achieved. For example:

  • One defines the view SQL using a non-SQL string (e.g., some other DSL), and still leverages SQLViewReperesentation for that. Even if this is permitted, it is not clear from the spec if it is okay.
  • The dialect could be the same but not standardized, e.g., dialects such as spark, Spark, SparkSQL, Spark SQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what we need is an API to add supported dialects to the metadata? It is up to each catalog implementation to add their dialects, and view definitions can be associated with a dialect ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what we need is an API to add supported dialects to the metadata? It is up to each catalog implementation to add their dialects, and view definitions can be associated with a dialect ID.

Already added

  /**
   * Add a SQL {@link View} for a different dialect.
   *
   * @param sqlViewRepresentation a SQL view representation
   * @return this for method chaining
   * @throws IllegalArgumentException if the dialect is the same as the SQL view being built
   */
  ViewBuilder withOtherSQLRepresentation(SQLViewRepresentation sqlViewRepresentation);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to making dialects (not just view representations) a construct in the metadata where people can define and add them, e.g., some catalogs define SparkSQL, id = 1 and TrinoSQL, id = 2, and each view representation references a standardized dialect ID. They are not part of the Iceberg spec, but defining and adding them dynamically is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One defines the view SQL using a non-SQL string

I don't think this is allowed. If you create a SQL view then it should contain SQL. In any case, being strict about dialects doesn't help prevent this.

The dialect could be the same but not standardized

This is a risk, but I think it isn't very likely. We should document the known dialect strings, but we shouldn't switch it to be an enum.

@jzhuge
Copy link
Member Author

jzhuge commented Nov 8, 2022

  • No need for something like View.updateRepresentions(), as buildView() + replace() should be enough

Can one replace the current version or an old version by adding a new dialect? Does it result in a new version?

Not supported.

To add a new dialect, buildView() + withOtherSQLRepresentation() + replace()

@wmoustafa
Copy link
Contributor

buildView() + withOtherSQLRepresentation() + replace() automatically copies existing dialects from previous versions or if we want to preserve them, we need to add them again?

@jzhuge
Copy link
Member Author

jzhuge commented Nov 8, 2022

buildView() + withOtherSQLRepresentation() + replace() automatically copies existing dialects from previous versions or if we want to preserve them, we need to add them again?

buildView() (with no argument) will start from scratch, i.e., we need to add everything again.

@rdblue
Copy link
Contributor

rdblue commented Nov 27, 2022

@jzhuge, this looks good to go. Can you rebase and fix tests?

@jzhuge
Copy link
Member Author

jzhuge commented Dec 2, 2022

@rdblue Please merge.

@rdblue rdblue merged commit b2f4694 into apache:master Dec 5, 2022
@rdblue
Copy link
Contributor

rdblue commented Dec 5, 2022

Merge! Let me know where the implementation PR is and I'll start looking at that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants