Spec: Add query-column-names to SQL view representation in view spec #6134

jzhuge · 2022-11-07T04:31:01Z

Current view spec misses the field query-column-names in SQL view representation.

For SELECT star view queries, the schema for the underlying table or view may change after the view has been created.
Thus, we need to store the column names of the view query, because when using the view, it is better to pick the columns
according to the name and order when the view was created and omit the extra columns we don't require.

wmoustafa · 2022-11-07T08:30:49Z

format/view-spec.md

+according to the name and order when the view was created and omit the extra columns we don't require.
+


Is not there a chance for them to become ambiguous after evolution? Most query engine just fail the query if the underlying tables evolve in a way that changes the number of columns a * expands to.

So if you create a SELECT * view on tables t1 with columns (a, b) joined with table t2with columns(x), then add column atot2, the query text will be ambiguous, and in fact I tried it on Spark and it threw an exception when querying the view: ```The SQL query of view db1.v` has an incompatible schema change and column a cannot be resolved. Expected 1 columns named a but got [a,a]```

The above sounds like an implementation detail of Spark, and probably should not be exposed by Iceberg. One solution to work around it is to record table names as well, not just view names. Another is to expect to expand the query text (i.e., expand the *) at view creation time, like what Hive does.

dimas-b · 2022-11-07T14:35:11Z

format/view-spec.md

@@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S
 | Optional | schema-id | ID of the view's schema when the version was created |
 | Optional | default-catalog | A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. |
 | Optional | default-namespace | The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. |


nit: references -> referenced (d at the end) ?

rdblue · 2022-11-27T22:31:36Z

I don't think that query column names are the right way to go. I think that the schema referenced by ID should be the pre-alias schema. That supports getting the pre-alias column names as well as more validation on data types and nested fields.

Spec: Add query-column-names to SQL view representation in view spec

e06839d

jzhuge mentioned this pull request Nov 7, 2022

API: Add view interfaces #4925

Merged

wmoustafa reviewed Nov 7, 2022

View reviewed changes

dimas-b reviewed Nov 7, 2022

View reviewed changes

rdblue closed this Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec: Add query-column-names to SQL view representation in view spec #6134

Spec: Add query-column-names to SQL view representation in view spec #6134

jzhuge commented Nov 7, 2022

wmoustafa Nov 7, 2022

wmoustafa Nov 7, 2022

dimas-b Nov 7, 2022

rdblue commented Nov 27, 2022

		according to the name and order when the view was created and omit the extra columns we don't require.

Spec: Add query-column-names to SQL view representation in view spec #6134

Spec: Add query-column-names to SQL view representation in view spec #6134

Conversation

jzhuge commented Nov 7, 2022

wmoustafa Nov 7, 2022

Choose a reason for hiding this comment

wmoustafa Nov 7, 2022

Choose a reason for hiding this comment

dimas-b Nov 7, 2022

Choose a reason for hiding this comment

rdblue commented Nov 27, 2022