Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/parser: Walk never traverses subqueries in a FROM clause #9952

Closed
a-robinson opened this issue Oct 13, 2016 · 6 comments
Closed

sql/parser: Walk never traverses subqueries in a FROM clause #9952

a-robinson opened this issue Oct 13, 2016 · 6 comments
Assignees

Comments

@a-robinson
Copy link
Contributor

It looks like this is preventing things like normalization and type checking from taking effect on subqueries in a FROM clause, i.e. select foo from (SUBQUERY GOES HERE);. The parser.Walk code seems to assume that there's nothing of interest in FROM clauses other than AS OF expressions. I've verified with a dumb Visitor (that simply prints each Expr's type) that the Subquery is never reached when running Walk over such queries.

I can fix this up while resolving #9921, but I'm curious what approach you guys like. The most tempting option to me is making the TableExpr interface satisfy the Expr interface and adding Walk methods for the structs that implement TableExpr, which will make adding the DB name normalization walker for #9921 easier, but thinking about that use may be biasing me.

@knz @RaduBerinde @nvanbenschoten

@a-robinson a-robinson self-assigned this Oct 13, 2016
@a-robinson a-robinson added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 13, 2016
@RaduBerinde
Copy link
Member

We have a subquery visitor which finds all subqueries, executes them as separate statements, and replaces them with a VALUES node with the results.

@knz
Copy link
Contributor

knz commented Oct 13, 2016

Radu: I don't think we want to do that for the purpose of preparing the query string to be stored in the view descriptor.
For context, we are discussing how to best normalize the SQL string that defines a view before it is put in the view descriptor. In particular all unqualified names must be qualified with the database name at least.

Alex: we should distinguish the issues here:

  1. It is true that Walk does not recurse in the From clause.
  2. However it is not true that normalization and type checking does not happen in the arguments of From. See below for details.
  3. By the way I advise we not call a select clause as argument to FROM a "sub-query" for the purpose of our SQL execution. Even though it's called such in SQL-related literature, the semantics of a FROM argument and a sub-query that is part of an expression are so vastly different that we should not let the same term overlap for the two concepts. I have personally settled on "sub-clause" for FROM arguments and "sub-query" for select clauses in expressions. I'll use that below.

Regarding point (2) above. The semantic analysis is currently done on sub-clauses as part of the newPlan() recursion that creates the planNode. When a plain select clause is encountered, the Select() constructor is called. This then calls initFrom() to process the FROM clause. This then computes the data source based on the FROM arguments, see getDataSource() in data_source.go. If a sub-query is encountered there, it simply launches newPlan() on the sub-clause and uses the result as data source. The recursive newPlan() call in turn calls the Select() constructor on the sub-clause which causes all the semantic checks to happen for the sub-clause, including type checking, normalization etc.

So the problem you have for #9921 is really that all the normalization work, including table name normalization for views, is currently done as part of the planNode recursive construction. So you thought about invoking the Walk mechanism instead to do this normalization. This is fair, but perhaps you are under-estimating how much work is needed to do this properly. Namely the entire getDataSource() processing should be done during this recursion as well, because otherwise you cannot resolve names properly. (Name resolution is dependent on the results of getDataSource())
Just "making TableExpr an expr and hope for the best" is probably optimistic :)

So then where to go from here?

Just so you know we had a short discussion today at the office to try to tackle this issue, between others. There's actually a laundry list of issues similar to yours here that could really use a semantic analysis phase separate from the planNode construction. This analysis would resolve names between other things and annotate normalized names into the syntax tree, so that you can get the information you need directly for the view descriptor. This is basically the comprehensive extension of the idea you propose above. We can talk about it if you wish. However I fear it will take a few week before we get there.

I am not sure we can do much better in a short term solution. Extending Walk to resolve names would introduce code duplication with that the planNode constructor currently does, and even if that was acceptable it would still be some effort. I'll let the other guys chime in and perhaps even @petermattis would have an opinion on this.

@RaduBerinde
Copy link
Member

To be clear, I wasn't suggesting anything, I was just clarifying how existing stuff works.

@a-robinson a-robinson removed the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 13, 2016
@a-robinson
Copy link
Contributor Author

a-robinson commented Oct 13, 2016

I see, thanks for the explanation.

It seems strange to me that the use of Walk and planNode construction are as intermingled as they are already, but I'm presumably missing a lot of background context and haven't read all the code. Introducing more duplication there doesn't sound particularly appealing to me.

I take it that the Format() approach I mentioned in the other issue isn't feasible because it doesn't have enough info to properly resolve the table, right? I mean, we could pass a closure into Format that replicates the logic in getDataSource, but I don't know if that's going too far off the rails.

I think I'll need to talk to you about this unless someone has a different proposal.

@knz
Copy link
Contributor

knz commented Oct 13, 2016

The reason why things are the way they are now is organic growth from a much simpler starting point. It's fair to want something else and I think we've reached a tipping point this week with enough impetus gathered to make it happen.

@a-robinson
Copy link
Contributor Author

I'm going to close this, as the original issue was primarily a misunderstanding. I've sent out #10026 to try and prevent future misunderstandings of the sort. Feel free to reopen this if you want to adopt it for a related purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants