sql/parser: Walk never traverses subqueries in a FROM clause #9952

a-robinson · 2016-10-13T17:06:58Z

It looks like this is preventing things like normalization and type checking from taking effect on subqueries in a FROM clause, i.e. select foo from (SUBQUERY GOES HERE);. The parser.Walk code seems to assume that there's nothing of interest in FROM clauses other than AS OF expressions. I've verified with a dumb Visitor (that simply prints each Expr's type) that the Subquery is never reached when running Walk over such queries.

I can fix this up while resolving #9921, but I'm curious what approach you guys like. The most tempting option to me is making the TableExpr interface satisfy the Expr interface and adding Walk methods for the structs that implement TableExpr, which will make adding the DB name normalization walker for #9921 easier, but thinking about that use may be biasing me.

@knz @RaduBerinde @nvanbenschoten

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2016-10-13T17:36:29Z

We have a subquery visitor which finds all subqueries, executes them as separate statements, and replaces them with a VALUES node with the results.

knz · 2016-10-13T17:41:23Z

Radu: I don't think we want to do that for the purpose of preparing the query string to be stored in the view descriptor.
For context, we are discussing how to best normalize the SQL string that defines a view before it is put in the view descriptor. In particular all unqualified names must be qualified with the database name at least.

Alex: we should distinguish the issues here:

It is true that Walk does not recurse in the From clause.
However it is not true that normalization and type checking does not happen in the arguments of From. See below for details.
By the way I advise we not call a select clause as argument to FROM a "sub-query" for the purpose of our SQL execution. Even though it's called such in SQL-related literature, the semantics of a FROM argument and a sub-query that is part of an expression are so vastly different that we should not let the same term overlap for the two concepts. I have personally settled on "sub-clause" for FROM arguments and "sub-query" for select clauses in expressions. I'll use that below.

Regarding point (2) above. The semantic analysis is currently done on sub-clauses as part of the newPlan() recursion that creates the planNode. When a plain select clause is encountered, the Select() constructor is called. This then calls initFrom() to process the FROM clause. This then computes the data source based on the FROM arguments, see getDataSource() in data_source.go. If a sub-query is encountered there, it simply launches newPlan() on the sub-clause and uses the result as data source. The recursive newPlan() call in turn calls the Select() constructor on the sub-clause which causes all the semantic checks to happen for the sub-clause, including type checking, normalization etc.

So the problem you have for #9921 is really that all the normalization work, including table name normalization for views, is currently done as part of the planNode recursive construction. So you thought about invoking the Walk mechanism instead to do this normalization. This is fair, but perhaps you are under-estimating how much work is needed to do this properly. Namely the entire getDataSource() processing should be done during this recursion as well, because otherwise you cannot resolve names properly. (Name resolution is dependent on the results of getDataSource())
Just "making TableExpr an expr and hope for the best" is probably optimistic :)

So then where to go from here?

Just so you know we had a short discussion today at the office to try to tackle this issue, between others. There's actually a laundry list of issues similar to yours here that could really use a semantic analysis phase separate from the planNode construction. This analysis would resolve names between other things and annotate normalized names into the syntax tree, so that you can get the information you need directly for the view descriptor. This is basically the comprehensive extension of the idea you propose above. We can talk about it if you wish. However I fear it will take a few week before we get there.

I am not sure we can do much better in a short term solution. Extending Walk to resolve names would introduce code duplication with that the planNode constructor currently does, and even if that was acceptable it would still be some effort. I'll let the other guys chime in and perhaps even @petermattis would have an opinion on this.

RaduBerinde · 2016-10-13T17:43:44Z

To be clear, I wasn't suggesting anything, I was just clarifying how existing stuff works.

a-robinson · 2016-10-13T18:10:16Z

I see, thanks for the explanation.

It seems strange to me that the use of Walk and planNode construction are as intermingled as they are already, but I'm presumably missing a lot of background context and haven't read all the code. Introducing more duplication there doesn't sound particularly appealing to me.

I take it that the Format() approach I mentioned in the other issue isn't feasible because it doesn't have enough info to properly resolve the table, right? I mean, we could pass a closure into Format that replicates the logic in getDataSource, but I don't know if that's going too far off the rails.

I think I'll need to talk to you about this unless someone has a different proposal.

knz · 2016-10-13T18:13:10Z

The reason why things are the way they are now is organic growth from a much simpler starting point. It's fair to want something else and I think we've reached a tipping point this week with enough impetus gathered to make it happen.

a-robinson · 2016-10-17T13:56:13Z

I'm going to close this, as the original issue was primarily a misunderstanding. I've sent out #10026 to try and prevent future misunderstandings of the sort. Feel free to reopen this if you want to adopt it for a related purpose.

a-robinson self-assigned this Oct 13, 2016

a-robinson added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 13, 2016

a-robinson removed the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 13, 2016

a-robinson closed this as completed Oct 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/parser: Walk never traverses subqueries in a FROM clause #9952

sql/parser: Walk never traverses subqueries in a FROM clause #9952

a-robinson commented Oct 13, 2016

RaduBerinde commented Oct 13, 2016

knz commented Oct 13, 2016 •

edited

Loading

RaduBerinde commented Oct 13, 2016

a-robinson commented Oct 13, 2016 •

edited

Loading

knz commented Oct 13, 2016

a-robinson commented Oct 17, 2016

sql/parser: Walk never traverses subqueries in a FROM clause #9952

sql/parser: Walk never traverses subqueries in a FROM clause #9952

Comments

a-robinson commented Oct 13, 2016

RaduBerinde commented Oct 13, 2016

knz commented Oct 13, 2016 • edited Loading

RaduBerinde commented Oct 13, 2016

a-robinson commented Oct 13, 2016 • edited Loading

knz commented Oct 13, 2016

a-robinson commented Oct 17, 2016

knz commented Oct 13, 2016 •

edited

Loading

a-robinson commented Oct 13, 2016 •

edited

Loading