Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Add query-column-names to SQL view representation in view spec #6134

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion format/view-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S
| Optional | schema-id | ID of the view's schema when the version was created |
| Optional | default-catalog | A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. |
| Optional | default-namespace | The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: references -> referenced (d at the end) ?

| Optional | query-column-names | The output column names of the query when the view was created. The field aliases are not applied. The list should have the same length as the schema's top level fields. See the example below. |
| Optional | field-aliases | A list of strings of field aliases optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |
| Optional | field-docs | A list of strings of field comments optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |

For `CREATE VIEW v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT col1, col2, ...`,
the field aliases are 'alias_name', 'alias_name2', and etc., and the field docs are 'docs', null, and etc.
the field aliases are 'alias_name', 'alias_name2', etc., and the field docs are 'docs', null, etc.

The view schema should have the field aliases applied.

For SELECT star view queries, the schema for the underlying table or view may change after the view has been created.
Thus, we need to store the column names of the view query, because when using the view, we need to pick the columns
according to the name and order when the view was created and omit the extra columns we don't require.

Comment on lines +130 to +131
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not there a chance for them to become ambiguous after evolution? Most query engine just fail the query if the underlying tables evolve in a way that changes the number of columns a * expands to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you create a SELECT * view on tables t1 with columns (a, b) joined with table t2with columns(x), then add column atot2, the query text will be ambiguous, and in fact I tried it on Spark and it threw an exception when querying the view: ```The SQL query of view db1.v` has an incompatible schema change and column a cannot be resolved. Expected 1 columns named a but got [a,a]```

The above sounds like an implementation detail of Spark, and probably should not be exposed by Iceberg. One solution to work around it is to record table names as well, not just view names. Another is to expect to expand the query text (i.e., expand the *) at view creation time, like what Hive does.


## Appendix A: An Example

Expand Down