sql: make view descriptors depend on table/columns by IDs #15388

knz · 2017-04-26T22:51:02Z

(Needed for #12611 and #13968.)

This patch does what it says on the label: the descriptor still
contains a valid SQL query, but with all table names rewritten to use
numeric table references. These are hidden initially, until
the referenced tables gets a new schema.

For example:

 CREATE TABLE kv(k INT PRIMARY KEY, v INT);
 CREATE VIEW vx AS SELECT v AS x FROM kv;
 SHOW CREATE VIEW vx;
 +------+-----------------------------------------+
 | View |           CreateView                    |
 +------+-----------------------------------------+
 | vx   | CREATE VIEW vx AS SELECT v AS x FROM kv |
 +------+-----------------------------------------+
 (1 row)
 ALTER TABLE kv ADD COLUMN d INT;
 SHOW CREATE VIEW vx;
 +------+--------------------------------------------------------------+
 | View |                          CreateView                          |
 +------+--------------------------------------------------------------+
 | vx   | CREATE VIEW vx AS SELECT v AS x FROM [64(1, 2) AS kv (k, v)] |
 +------+--------------------------------------------------------------+
 (1 row)

A side effect of this patch is that view definitions can now use star
expansion, i.e. CREATE VIEW kv_alias AS SELECT * FROM kv is now
possible and valid. (And there was much rejoicing.)

Another side effect, visible in the example above, is that SHOW CREATE VIEW now reveals the structural dependency (showing both the
ID and the original table name) when the table's name or column
definitions change.

This may or may not be desirable from a UX perspective, however anyone
wishing to improve upon this will take note that if the column list in
the table descriptor was altered after the view was created (e.g. to
add new columns, rename columns or remove columns not depended on by
the view), there is no valid way to print out the view definition
using valid SQL but without using the table reference syntax. Consider
the example above: suppose column "v" was renamed to "w"; trying to
print out as create view vx as select v as x from kv would be
invalid because then column "v" would have disappeared; then suppose
"v" was renamed to "w" and a new unrelated column "v" was added,
trying create view vx as select v as x from kv as kv(v) would be
invalid as well because the new column v is unrelated to the one the
view was intended to depend on. Due to these obstacles, it is
advisable to let the numeric table reference show up in the output of
SHOW CREATE VIEW and document this behavior.

Although it would be correct to do so for newly-created clusters, this
patch does not lift the restriction on ALTER that prevents it from
renaming tables or columns depended on by views. This change would be
correct on new clusters because once the guarantee is enforced that
all view descriptors depend on table/columns by IDs, renaming becomes
possible (renaming preserves the ID); however, for previously created
clusters there may exist already some view descriptors containing
by-name table references, and allowing those to be renamed would be
unsound. The proper approach is to completement this patch by a
cluster migration which rewrites all existing view descriptors. This
work is left to a subsequent commit.

A last improvement brought by this PR is that column renames are now
properly visible in pg_catalog:

> CREATE VIEW v (x) AS SELECT k FROM kv
> SELECT definition FROM pg_catalog.pg_views WHERE viewname = 'v'
SELECT k AS x FROM (SELECT k FROM kv)

cc @a-robinson

cockroach-teamcity · 2017-04-26T22:51:07Z

This change is

RaduBerinde · 2017-04-26T23:13:09Z

Great stuff!

How would CREATE VIEW vx AS SELECT a AS x FROM kv AS foo(a,b); show up? CREATE VIEW vs AS SELECT a AS x FROM [13(1,2) AS kv(k,v)] AS foo(a,b) ? And if we rename kv(k,v) to bar(x,y) how would it show up?

One thing I don't quite understand is: if I'm understanding the change correctly, we look up the current table and column names when we print out the expression. Why wouldn't showing that (without the numeric references) be correct? In your example, suppose column "v" was renamed to "w", we we would display it with the new column name "w".

Review status: 0 of 16 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

pkg/sql/testdata/logic_test/views, line 504 at r1 (raw file):

query II
SELECT * FROM s1 ORDER BY 1
----

Can we have some tests that rename tables/columns and then do SHOW CREATE VIEW again?

Comments from Reviewable

knz · 2017-04-26T23:25:25Z

root@:26257/t> CREATE VIEW vx AS SELECT a AS x FROM kv AS foo(a,b);
CREATE VIEW
root@:26257/t> SHOW CREATE VIEW vx;
+------+----------------------------------------------------------------------------+
| View |                                 CreateView                                 |
+------+----------------------------------------------------------------------------+
| vx   | CREATE VIEW vx AS SELECT a AS x FROM [64(1, 2) AS kv (k, v)] AS foo (a, b) |
+------+----------------------------------------------------------------------------+
(1 row)

Renames are not yet activated, as per the commit message (I haven't implemented the necessary migration yet). However, hypothetically:

root@:26257/t> CREATE VIEW vx AS SELECT a AS x FROM kv AS foo(a,b);
CREATE VIEW
root@:26257/t> ALTER TABLE kv ALTER COLUMN k RENAME TO x;
ALTER TABLE
root@:26257/t> ALTER TABLE kv ALTER COLUMN v RENAME TO y;
ALTER TABLE
root@:26257/t> ALTER TABLE kv RENAME TO bar;
ALTER TABLE
root@:26257/t> SHOW CREATE VIEW vx;
+------+----------------------------------------------------------------------------+
| View |                                 CreateView                                 |
+------+----------------------------------------------------------------------------+
| vx   | CREATE VIEW vx AS SELECT a AS x FROM [64(1, 2) AS kv (k, v)] AS foo (a, b) |
+------+----------------------------------------------------------------------------+
(1 row)

The "AS kv" clause is embedded into the view descriptor, for good reason: every potential other reference to the names "kv", "k" or "v" in the rest of the query cannot be (easily) rewritten.

To answer your other points:

if I'm understanding the change correctly, we look up the current table and column names when we print out the expression.

No that's not correct; the names are embedded alongside the IDs during view creation. The code for SHOW CREATE VIEW is unchanged, and is dumb (it simply prints out whatever it finds in the descriptor as-is).

[Assuming SHOW CREATE could look up the "new" name from a numeric table reference] Why wouldn't showing that (without the numeric references) be correct?

That's because we do not have valid SQL syntax to display a non-numeric-reference table where the set of columns is different.

Again the example from initially (hypothetically):

CREATE TABLE kv(k, v)
CREATE VIEW kv_alias AS SELECT * FROM kv;
ALTER TABLE kv ALTER COLUMN v RENAME TO w;
ALTER TABLE kv ADD COLUMN v INT;
SHOW CREATE VIEW kv_alias;

What do you expect to see?

CREATE VIEW kv_alias AS SELECT * FROM kv -> incorrect, v has changed definition
CREATE VIEW kv_alias AS SELECT * FROM kv AS kv(k, v) -> incorrect, mismatch in the number of columns (kv has 3 columns now)

Maybe you're thinking about constructing an ad-hoc subquery CREATE VIEW kv_alias AS SELECT * FROM (SELECT k, w AS v FROM kv), but that would cause the show create view / create view not to roundtrip properly (we check for that in multiple places)

RaduBerinde · 2017-04-26T23:54:32Z

I see. My thinking was that we could just replace the IDs with the current names, and we can do that even if we have to keep the AS old_name(old_cols,..) along with it. But now I understand that this notation not only allows use of IDs, but also allows projecting to a subset of columns (this is quite a departure from what FROM <table> can normally do).

This projection is only useful if we want to leave any * as is instead of pre-expanding it. Frankly I think things would be clearer if we just pre-expand it. If we did that, we could always just use the current table name and rename it with AS as necessary. So in your last example, I would expect CREATE VIEW kv_alias AS SELECT k,v FROM kv AS kv(k,v,unused1).

The ad-hoc subquery to do the projection would work too I guess, as long as we only include it when a projection is necessary. There wouldn't be any round-tripping issues unless there's a schema change in-between.

RaduBerinde · 2017-04-27T00:11:34Z

The code

Review status: 0 of 16 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

pkg/sql/data_source.go, line 443 at r2 (raw file):

// renameSource applies an AS clause to a data source.
func renameSource(
	src planDataSource, as parser.AliasClause, includeHidden bool,

is an AliasClause that includes hidden columns valid? Do we tolerate both in the syntax, depending on the number of columns?

Comments from Reviewable

knz · 2017-04-27T00:21:46Z

In a regular AS clause (from t AS x(y,z)) hidden columns are never taken into account; in a numeric table reference (from [123 as x(y,z)]), hidden columns are always taken into account.

I will look into doing the other thing you mentioned earlier.

a-robinson · 2017-04-27T15:56:39Z

Thanks for doing this, @knz! Let me know if you want a full review from me, otherwise I'm going to trust that @RaduBerinde found everything I would have and more.

Review status: 0 of 16 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

Comments from Reviewable

knz · 2017-05-02T17:54:15Z

Bumping to 1.1, too much cleanup to do here for today

knz · 2017-05-12T18:09:31Z

I'm kinda bummed about this PR, there's an unforeseen hurdle: when restoring a backup that was made prior to this change, the descriptor must be rewritten but it's not easy to handle this in the restore context.

A good way forward would be to introduce a new FormatVersion value for table descriptors, which is one beyond the current one, and indicates the descriptor is using structural dependencies. This way the backup code can mostly ignore the problem.

(It also solves the other problem which this PR had from the beginning: the lack of a migration story that would enable ALTER RENAME.)

However doing so requires introducing a version upgrade path that doesn't exist currently: one that looks up other descriptors in the process of updating the current one. As the descriptor upgrade can occur in the lease code, I am very concerned that there's some kind of care to be taken that the table descriptors that will be used during the rename are "atomically considered" with the lease being taken on the upgraded descriptor. I do not know how to do this.

@jordanlewis do you think we could sit together at some point and look at this together? I believe you are more familiar with this code.

knz · 2017-07-07T18:24:11Z

@cuongdo I would like to ressucitate this PR, but I fear it needs concerted attention from people comfortable with both the lease code, descriptor code and perhaps backup code.

knz · 2017-07-28T19:05:54Z

An update on this PR:

with suggestions from @danhhz I was able to ensure descriptors are properly reassigned to new IDs upon restore
this PR still misses a general migration step to update "old format" view queries in existing descriptors - I will do this next. Two separate migrations needs to exist, both at cluster start-up and during restore (restoring an old-format desc onto a new-format db)

knz · 2017-07-28T19:06:17Z

Some tests failing due to base changes. Will also fix next.

knz · 2017-07-30T00:27:04Z

One step forward - the only remaining broken test is because of #17306, which I will fix shortly.

Prior to this patch, view dependency analysis during CREATE VIEW was broken because it would only check dependencies after plan optimization, i.e. possibly after some dependency were lost. For example, with the queries: ```sql CREATE VIEW v AS SELECT k FROM (SELECT k,v FROM kv) -- loses dependency on kv.v CREATE VIEW v AS SELECT k,v FROM kv WHERE FALSE -- loses dependency on kv ``` This patch addresses the issue as follows: - the dependencies are now collected during the initial construction of the query plan, before any optimization are applied. This way we ensure the dependency tracking is complete. - a migration is implemented that will fix any view descriptor and corresponding dependency information that was populated prior to this fix.

This patch does what it says on the label: the descriptor still contains a valid SQL query, but with all table names rewritten to use numeric table references. These are hidden initially, until the referenced tables gets a new schema. For example: ``` CREATE TABLE kv(k INT PRIMARY KEY, v INT); CREATE VIEW vx AS SELECT v AS x FROM kv; SHOW CREATE VIEW vx; +------+-----------------------------------------+ | View | CreateView | +------+-----------------------------------------+ | vx | CREATE VIEW vx AS SELECT v AS x FROM kv | +------+-----------------------------------------+ (1 row) ALTER TABLE kv ADD COLUMN d INT; SHOW CREATE VIEW vx; +------+--------------------------------------------------------------+ | View | CreateView | +------+--------------------------------------------------------------+ | vx | CREATE VIEW vx AS SELECT v AS x FROM [64(1, 2) AS kv (k, v)] | +------+--------------------------------------------------------------+ (1 row) ``` A side effect of this patch is that view definitions can now use star expansion, i.e. `CREATE VIEW kv_alias AS SELECT * FROM kv` is now possible and valid. (And there was much rejoicing.) Another side effect, visible in the example above, is that `SHOW CREATE VIEW` now reveals the structural dependency (showing both the ID and the original table name) when the table's name or column definitions change. This may or may not be desirable from a UX perspective, however anyone wishing to improve upon this will take note that if the column list in the table descriptor was altered after the view was created (e.g. to add new columns, rename columns or remove columns not depended on by the view), there is no valid way to print out the view definition using valid SQL but without using the table reference syntax. Consider the example above: suppose column "v" was renamed to "w"; trying to print out as `create view vx as select v as x from kv` would be invalid because then column "v" would have disappeared; then suppose "v" was renamed to "w" and a new unrelated column "v" was added, trying `create view vx as select v as x from kv as kv(v)` would be invalid as well because the new column `v` is unrelated to the one the view was intended to depend on. Due to these obstacles, it is advisable to let the numeric table reference show up in the output of `SHOW CREATE VIEW` and document this behavior. Although it would be correct to do so for newly-created clusters, this patch does *not* lift the restriction on ALTER that prevents it from renaming tables or columns depended on by views. This change would be correct on new clusters because once the guarantee is enforced that all view descriptors depend on table/columns by IDs, renaming becomes possible (renaming preserves the ID); however, for previously created clusters there may exist already some view descriptors containing by-name table references, and allowing those to be renamed would be unsound. The proper approach is to completement this patch by a cluster migration which rewrites all existing view descriptors. This work is left to a subsequent commit. A last improvement brought by this PR is that column renames are now properly visible in pg_catalog: ```sql > CREATE VIEW v (x) AS SELECT k FROM kv > SELECT definition FROM pg_catalog.pg_views WHERE viewname = 'v' SELECT k AS x FROM (SELECT k FROM kv) ```

knz · 2017-08-02T00:41:54Z

cc @danhhz can you check I didn't do anything foolish in restore.go specifically?

danhhz · 2017-08-02T14:09:35Z

I only looked at restore.go. Left one comment

Review status: 0 of 44 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

pkg/ccl/sqlccl/restore.go, line 311 at r4 (raw file):

			}
		}
		if hasRestoredDep {

since we error in the second branch, this is equivalent to if len(table.DependsOn) > 0, right? I find that a little clearer

In the event that you strongly disagree and want to keep it like this, then please rename hasRestoredDep to something with View in it

This all also really wants a backup/restore test, but I'm fine leaving that for a followup given how big this PR is already

Comments from Reviewable

knz · 2017-08-04T12:46:39Z

@RaduBerinde @jordanlewis I am considering closing this PR in favor of another solution I discovered this week:

instead of rewriting the view query as done in this PR, instead,

extend instead the view descriptor with a new field "Apparent Schema" containing a mapping of (db name, table name) to (table ID, list of visible column IDs+Names, list of visible indexes+names), initialized at the moment the view is created
extend the parameters of getDataSource() with a similar mapping
populate the parameters of getDataSource() for further recursion each time a view descriptor is opened by getViewSource
make getDataSource()/getTablePlan() first look up table names using the new mapping before going to the table cache / KV.

The advantage is that the view query would stay identical which would improve UX when looking at the view query after new columns have been added to the underlying views/tables.

What do you think?

RaduBerinde · 2017-08-04T13:11:35Z

I agree that would better from a UX perspective. I don't know if there are any subtleties involved implementation-wise, I'll let @jordanlewis chime in.

knz · 2017-08-08T12:34:12Z

Closing in favor of #17501.

knz requested a review from RaduBerinde April 26, 2017 22:51

knz force-pushed the structural-views branch from 2b1eb2e to a2038c2 Compare April 26, 2017 23:02

knz force-pushed the structural-views branch from a2038c2 to 3cc6930 Compare April 26, 2017 23:13

knz added the docs-todo label Apr 27, 2017

knz mentioned this pull request Apr 27, 2017

sql: clean up and reduce the TableName public interface. #15030

Closed

knz added this to the 1.1 milestone May 2, 2017

knz mentioned this pull request May 2, 2017

sql: force column types in view plans (make view descriptors future-proof) #12611

Open

knz force-pushed the structural-views branch from 3cc6930 to c168748 Compare May 12, 2017 18:00

knz force-pushed the structural-views branch 2 times, most recently from b2377fe to 90370ee Compare May 12, 2017 19:18

knz mentioned this pull request Jul 7, 2017

sql: Support renaming objects depended on by views #10083

Open

knz mentioned this pull request Jul 14, 2017

sql: make database, table, view and columns lookups properly case-sensitive #16884

Merged

knz force-pushed the structural-views branch from 90370ee to c0dd44c Compare July 14, 2017 21:56

knz mentioned this pull request Jul 14, 2017

sql: make the table reference syntax carry its alias, if any #17031

Merged

knz force-pushed the structural-views branch 2 times, most recently from ffa0c99 to 28baa3c Compare July 14, 2017 22:22

knz force-pushed the structural-views branch from 28baa3c to f688ee6 Compare July 24, 2017 11:52

knz mentioned this pull request Jul 24, 2017

sql: homogeneize / abstract access to db / table descriptors #17188

Closed

knz force-pushed the structural-views branch 2 times, most recently from f589bc1 to efa474d Compare July 25, 2017 13:57

This was referenced Jul 27, 2017

sql: views do not track their dependencies properly #17269

Closed

sql: miscellaneous improvements prior to view improvements #17286

Merged

sql: un-break view descriptors by being more conservative about dependencies #17280

Merged

knz force-pushed the structural-views branch 4 times, most recently from bcf0cf5 to 3cacc5e Compare July 28, 2017 18:52

knz force-pushed the structural-views branch from 3cacc5e to 66dbb23 Compare July 30, 2017 00:27

This was referenced Jul 30, 2017

sql: views don't track their dependencies correctly (again...) #17306

Closed

sql: re-vamp the view dependency analysis #17310

Merged

knz force-pushed the structural-views branch 2 times, most recently from 8af350b to d9271e2 Compare August 1, 2017 22:37

knz added 2 commits August 2, 2017 00:22

knz force-pushed the structural-views branch from d9271e2 to ba28022 Compare August 2, 2017 00:39

knz requested a review from a team as a code owner August 2, 2017 00:39

jseldess mentioned this pull request Aug 7, 2017

sql: Make it possible to rename tables and columns underlying a view cockroachdb/docs#1573

Closed

This was referenced Aug 7, 2017

sql: small view descriptor clean-ups #17475

Closed

sql: check backup/restore of views defined over system descriptors? #17476

Closed

sql: make view descriptors depend on table/columns by IDs #17501

Draft

knz closed this Aug 8, 2017

knz deleted the structural-views branch April 27, 2018 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: make view descriptors depend on table/columns by IDs #15388

sql: make view descriptors depend on table/columns by IDs #15388

knz commented Apr 26, 2017 •

edited

Loading

cockroach-teamcity commented Apr 26, 2017

RaduBerinde commented Apr 26, 2017

knz commented Apr 26, 2017

RaduBerinde commented Apr 26, 2017

RaduBerinde commented Apr 27, 2017

knz commented Apr 27, 2017

a-robinson commented Apr 27, 2017

knz commented May 2, 2017

knz commented May 12, 2017

knz commented Jul 7, 2017

knz commented Jul 28, 2017

knz commented Jul 28, 2017

knz commented Jul 30, 2017

knz commented Aug 2, 2017

danhhz commented Aug 2, 2017

knz commented Aug 4, 2017 •

edited

Loading

RaduBerinde commented Aug 4, 2017

knz commented Aug 8, 2017

sql: make view descriptors depend on table/columns by IDs #15388

sql: make view descriptors depend on table/columns by IDs #15388

Conversation

knz commented Apr 26, 2017 • edited Loading

cockroach-teamcity commented Apr 26, 2017

RaduBerinde commented Apr 26, 2017

knz commented Apr 26, 2017

RaduBerinde commented Apr 26, 2017

RaduBerinde commented Apr 27, 2017

knz commented Apr 27, 2017

a-robinson commented Apr 27, 2017

knz commented May 2, 2017

knz commented May 12, 2017

knz commented Jul 7, 2017

knz commented Jul 28, 2017

knz commented Jul 28, 2017

knz commented Jul 30, 2017

knz commented Aug 2, 2017

danhhz commented Aug 2, 2017

knz commented Aug 4, 2017 • edited Loading

RaduBerinde commented Aug 4, 2017

knz commented Aug 8, 2017

knz commented Apr 26, 2017 •

edited

Loading

knz commented Aug 4, 2017 •

edited

Loading