-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix analysis/planning when lambda arguments clash with relation's columns #9099
Fix analysis/planning when lambda arguments clash with relation's columns #9099
Conversation
I like #9090 more. |
{ | ||
if (!analysis.isColumnReference(expression)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when this is needed? Is there some query that fails or this is just fail-safe?
Maybe just add check state that expression is column reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed e.g. when r
is a column reference of type row(a: integer, ...)
and we try to get symbol for r.a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this would happen and why it wasn't a problem before?
Scope shouldn't resolve non column reference fields with .tryResolveField(expression)
.
I guess now non-lambda scope is applied for lambda arg expression.
Maybe instead check that expression
is not lambda a argument reference (since this is what actually we want to avoid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be redoing analysis in the query planning. Why would I want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, using scope again here is redoing analysis, but we have to live with this for now (this method will be gone eventually). However, as I mentioned, scope resolves only column references. That is why this change is questionable. In fact, the intention was to prevent using incorrect (non-lambda) scope for lambda arg reference. Because of such, it would be more natural to check if expression is not a lambda reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. Except that it's not possible to just "check if expression is not a lambda reference" without redoing analysis, so I'd argue that the presented solution is more natural. Are we bike shedding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that it's not possible to just "check if expression is not a lambda reference"
What about adding method:
boolean Analysis::isLambdaArgumentReference(Expression expression) {
returns expression instanceof Identifier && lambdaArgumentReferences.contains(NodeRef.of((Identifier) expression));
}
which would be similar to isColumnReference(Expression expression)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this would work. But also it would also make logic more complex. Instead of "is column reference so resolvable with scope" it would be "is not lambda reference, so probably column reference so resolvable with scope".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "is column reference so resolvable with scope"
This is actually: "is expression that is column reference resolvable with scope (than only resolves column references)", but then I think: "hey, if it's not a column reference then it shouldn't be resolved by scope anyway, so why this additional check"
The entire planner and analyzer is very complex piece of code. The less WTF code the better (you experienced that too). The code here deserves comment at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here deserves comment at least.
Done.
Nah. #9090 reapplies part of analysis in |
@@ -1269,43 +1269,43 @@ private Context( | |||
this.nameToLambdaArgumentDeclarationMap = nameToLambdaArgumentDeclarationMap; | |||
} | |||
|
|||
public static Context notInLambda() | |||
static Context notInLambda() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be private
{ | ||
return new Context(null, null); | ||
} | ||
|
||
public static Context inLambda(Map<String, LambdaArgumentDeclaration> nameToLambdaArgumentDeclarationMap) | ||
static Context inLambda(Map<String, LambdaArgumentDeclaration> nameToLambdaArgumentDeclarationMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.. check rest of methods
@@ -282,11 +282,11 @@ private Type analyze(Expression expression, Scope scope, Context context) | |||
private class Visitor | |||
extends StackableAstVisitor<Type, Context> | |||
{ | |||
private final Scope scope; | |||
private final Scope baseScope; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is needed anymore, you can just use scope from Context
. Otherwise it's confusing which scope to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Scope
is used to resolve FieldReference
instances. Scope from context cannot be used for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From semantic point of view FieldReference
expressions should only be created for top level expressions (e.g: not in lambdas). Therefore context scope should be OK instead of using some tricky double scoping.
I guess the problem might be with analyzeExpressionsWithInputs
which is used in LocalExecutionPlanner
after SymbolToInputRewriter
. However, I think that there should be another kind of expression (e.g: ChannelReference
) for which type should be returned without using Scope
at all (see #7398, ChannelReference
would only be produced by SymbolToInputRewriter.)
Alternatively, ExpressionAnalyzer
might have optional Map<Integer, Type> inputMapping
that would be used for FieldReference
resolving (if present). The idea is that original (AST) FieldReferences
would be translated to symbols so FieldReferences
in LocalExecutionPlanner
are only created by SymbolToInputRewriter
. I think this is nicer temporary solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right about where the problematic FieldReference
instances come from. What you suggest sounds like a good direction for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least minimally you should fix ExpressionAnalyzer
with
Alternatively, ExpressionAnalyzer might have optional Map<Integer, Type> inputMapping that would be used for FieldReference resolving (if present). The idea is that original (AST) FieldReferences would be translated to symbols so FieldReferences in LocalExecutionPlanner are only created by SymbolToInputRewriter. I think this is nicer temporary solution.
This will allow to void double scoping in ExpressionAnalyzer
now and would be overall cleaner approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment explaining purpose of the ExpressionAnalyzer.Visitor#baseScope
field.
} | ||
Type returnType = process(node.getBody(), new StackableAstVisitorContext<>(Context.inLambda(context.getContext().getScope(), nameToLambdaArgumentDeclarationMap))); | ||
for (LambdaArgumentDeclaration lambdaArgument : lambdaArguments) { | ||
ResolvedField resolvedField = lambdaScope.resolveField(lambdaArgument, QualifiedName.of(lambdaArgument.getName().getValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can directly create field id here from RelationId.of(node)
and index of LambdaArgumentDeclaration
.
You can even use int stream for that, e.g:
IntStream.range(0, lambdaArguments.size())
.forEach(index -> fieldToLambdaArgumentDeclaration.put(new FieldId(RelationId.of(node), index));
, but for loop is also enough
Scope scope = context.getContext().getScope(); | ||
Optional<ResolvedField> resolvedField = scope.tryResolveField(node, qualifiedName); | ||
if (resolvedField.isPresent()) { | ||
if (!context.getContext().isInLambda() || !context.getContext().getFieldToLambdaArgumentDeclaration().containsKey(FieldId.from(resolvedField.get()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this special case is required as lambda won't resolve entire deference expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rebased |
innerExpressionAnalyzer.setExpressionType(argument, getExpressionType(argument)); | ||
} | ||
} | ||
return innerExpressionAnalyzer.analyze(expression, scope, context.getContext().expectingLambda(types)).getTypeSignature(); | ||
return innerExpressionAnalyzer.analyze(expression, baseScope, context.getContext().expectingLambda(types)).getTypeSignature(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.getContext().getScope()
? Definitely expectingLambda
preserves this.nameToLambdaArgumentDeclarationMap
in current master. Add test for this case (some kind of nested lambdas with reference to outer lambda argument)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the problem is with this Map<Integer, Type>
map that is used to resolve FieldReferences
(this information needs to be passed to innerExpressionAnalyzer
). #9099 (comment) might work here. It's strange though that no test failed (lack of proper test?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right Scope
is passed via context.getContext().expectingLambda(types)
expression, so this is correct.
.orElseThrow(() -> new IllegalStateException("No symbol mapping for node " + node)); | ||
if (analysis.isColumnReference(node)) { | ||
Optional<ResolvedField> resolvedField = rewriteBase.getScope().tryResolveField(node); | ||
if (resolvedField.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be check state: checkState(resolvedField.isPresent()
) as column references should be resolvable by corresponding scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they don't need to be. The check was here before and i think i tried to remove it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be strange if the if (resolvedField.isPresent()) {
is needed for tests to pass. If the expression is column reference then the scope here should be the one that resolved it during analysis. Subuqery expressions should be translated already and lambda scope doesn't resolve column references at all.
If the scope here is different than the one that resolved column reference during analysis then it's potentially semantically incorrect. Such different scope might not resolve field, but it might also (incorrectly) resolve field to a wrong field index.
In any case, we need to investigate if this issue is introduced by this PR or was already in the code. It could be that the if (resolvedField.isPresent()) {
is required and correct because it's just some edge case (e.g: part of ORDER BY scoping or some subquery is not supported and wasn't rewritten). Then we need to at least document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an edge-case related to order by planning in QueryPlanner
, needs to stay
This will be merged again after release is closed |
Fix lambda arguments handling in
ExpressionAnalyzer
and during query planning (TranslationMap
).Fixes #9023, #7784, #9025
cc @martint @sopel39