Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix analysis/planning when lambda arguments clash with relation's columns #9099

Merged
merged 5 commits into from
Nov 2, 2017
Merged

Fix analysis/planning when lambda arguments clash with relation's columns #9099

merged 5 commits into from
Nov 2, 2017

Conversation

findepi
Copy link
Contributor

@findepi findepi commented Oct 4, 2017

Fix lambda arguments handling in ExpressionAnalyzer and during query planning (TranslationMap).

Fixes #9023, #7784, #9025

cc @martint @sopel39

@findepi
Copy link
Contributor Author

findepi commented Oct 4, 2017

I like #9090 more.

{
if (!analysis.isColumnReference(expression)) {
Copy link
Contributor

@sopel39 sopel39 Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this is needed? Is there some query that fails or this is just fail-safe?
Maybe just add check state that expression is column reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed e.g. when r is a column reference of type row(a: integer, ...) and we try to get symbol for r.a.

Copy link
Contributor

@sopel39 sopel39 Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this would happen and why it wasn't a problem before?
Scope shouldn't resolve non column reference fields with .tryResolveField(expression).

I guess now non-lambda scope is applied for lambda arg expression.
Maybe instead check that expression is not lambda a argument reference (since this is what actually we want to avoid).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be redoing analysis in the query planning. Why would I want that?

Copy link
Contributor

@sopel39 sopel39 Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, using scope again here is redoing analysis, but we have to live with this for now (this method will be gone eventually). However, as I mentioned, scope resolves only column references. That is why this change is questionable. In fact, the intention was to prevent using incorrect (non-lambda) scope for lambda arg reference. Because of such, it would be more natural to check if expression is not a lambda reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. Except that it's not possible to just "check if expression is not a lambda reference" without redoing analysis, so I'd argue that the presented solution is more natural. Are we bike shedding?

Copy link
Contributor

@sopel39 sopel39 Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that it's not possible to just "check if expression is not a lambda reference"

What about adding method:

boolean Analysis::isLambdaArgumentReference(Expression expression) {
  returns expression instanceof Identifier && lambdaArgumentReferences.contains(NodeRef.of((Identifier) expression));
}

which would be similar to isColumnReference(Expression expression)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this would work. But also it would also make logic more complex. Instead of "is column reference so resolvable with scope" it would be "is not lambda reference, so probably column reference so resolvable with scope".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "is column reference so resolvable with scope"

This is actually: "is expression that is column reference resolvable with scope (than only resolves column references)", but then I think: "hey, if it's not a column reference then it shouldn't be resolved by scope anyway, so why this additional check"

The entire planner and analyzer is very complex piece of code. The less WTF code the better (you experienced that too). The code here deserves comment at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here deserves comment at least.

Done.

@sopel39
Copy link
Contributor

sopel39 commented Oct 4, 2017

I like #9090 more.

Nah. #9090 reapplies part of analysis in TranslationMap while you already have information which Node is column reference or lambda argument reference. In a way #9090 applies redundant logic and contains intermediate (unnecessary) step (creating scope for lambda analysis in translation map). This one is cleaner since it relies (in planning) on already performed analysis

@findepi findepi requested a review from martint October 6, 2017 17:15
@findepi findepi changed the title Fix planning when lambda argument shadows relation column Fix analysis/planning when lambda arguments clash with relation's columns Oct 11, 2017
@@ -1269,43 +1269,43 @@ private Context(
this.nameToLambdaArgumentDeclarationMap = nameToLambdaArgumentDeclarationMap;
}

public static Context notInLambda()
static Context notInLambda()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be private

{
return new Context(null, null);
}

public static Context inLambda(Map<String, LambdaArgumentDeclaration> nameToLambdaArgumentDeclarationMap)
static Context inLambda(Map<String, LambdaArgumentDeclaration> nameToLambdaArgumentDeclarationMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.. check rest of methods

@@ -282,11 +282,11 @@ private Type analyze(Expression expression, Scope scope, Context context)
private class Visitor
extends StackableAstVisitor<Type, Context>
{
private final Scope scope;
private final Scope baseScope;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed anymore, you can just use scope from Context. Otherwise it's confusing which scope to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Scope is used to resolve FieldReference instances. Scope from context cannot be used for that.

Copy link
Contributor

@sopel39 sopel39 Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From semantic point of view FieldReference expressions should only be created for top level expressions (e.g: not in lambdas). Therefore context scope should be OK instead of using some tricky double scoping.

I guess the problem might be with analyzeExpressionsWithInputs which is used in LocalExecutionPlanner after SymbolToInputRewriter. However, I think that there should be another kind of expression (e.g: ChannelReference) for which type should be returned without using Scope at all (see #7398, ChannelReference would only be produced by SymbolToInputRewriter.)

Alternatively, ExpressionAnalyzer might have optional Map<Integer, Type> inputMapping that would be used for FieldReference resolving (if present). The idea is that original (AST) FieldReferences would be translated to symbols so FieldReferences in LocalExecutionPlanner are only created by SymbolToInputRewriter. I think this is nicer temporary solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right about where the problematic FieldReference instances come from. What you suggest sounds like a good direction for the future.

Copy link
Contributor

@sopel39 sopel39 Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least minimally you should fix ExpressionAnalyzer with

Alternatively, ExpressionAnalyzer might have optional Map<Integer, Type> inputMapping that would be used for FieldReference resolving (if present). The idea is that original (AST) FieldReferences would be translated to symbols so FieldReferences in LocalExecutionPlanner are only created by SymbolToInputRewriter. I think this is nicer temporary solution.

This will allow to void double scoping in ExpressionAnalyzer now and would be overall cleaner approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment explaining purpose of the ExpressionAnalyzer.Visitor#baseScope field.

}
Type returnType = process(node.getBody(), new StackableAstVisitorContext<>(Context.inLambda(context.getContext().getScope(), nameToLambdaArgumentDeclarationMap)));
for (LambdaArgumentDeclaration lambdaArgument : lambdaArguments) {
ResolvedField resolvedField = lambdaScope.resolveField(lambdaArgument, QualifiedName.of(lambdaArgument.getName().getValue()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can directly create field id here from RelationId.of(node) and index of LambdaArgumentDeclaration.
You can even use int stream for that, e.g:

IntStream.range(0, lambdaArguments.size())
  .forEach(index -> fieldToLambdaArgumentDeclaration.put(new FieldId(RelationId.of(node), index)); 

, but for loop is also enough

Scope scope = context.getContext().getScope();
Optional<ResolvedField> resolvedField = scope.tryResolveField(node, qualifiedName);
if (resolvedField.isPresent()) {
if (!context.getContext().isInLambda() || !context.getContext().getFieldToLambdaArgumentDeclaration().containsKey(FieldId.from(resolvedField.get()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this special case is required as lambda won't resolve entire deference expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@findepi findepi dismissed sopel39’s stale review October 20, 2017 12:46

comments answered/applied

@findepi
Copy link
Contributor Author

findepi commented Oct 23, 2017

rebased

innerExpressionAnalyzer.setExpressionType(argument, getExpressionType(argument));
}
}
return innerExpressionAnalyzer.analyze(expression, scope, context.getContext().expectingLambda(types)).getTypeSignature();
return innerExpressionAnalyzer.analyze(expression, baseScope, context.getContext().expectingLambda(types)).getTypeSignature();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.getContext().getScope()? Definitely expectingLambda preserves this.nameToLambdaArgumentDeclarationMap in current master. Add test for this case (some kind of nested lambdas with reference to outer lambda argument)?

Copy link
Contributor

@sopel39 sopel39 Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the problem is with this Map<Integer, Type> map that is used to resolve FieldReferences (this information needs to be passed to innerExpressionAnalyzer). #9099 (comment) might work here. It's strange though that no test failed (lack of proper test?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right Scope is passed via context.getContext().expectingLambda(types) expression, so this is correct.

.orElseThrow(() -> new IllegalStateException("No symbol mapping for node " + node));
if (analysis.isColumnReference(node)) {
Optional<ResolvedField> resolvedField = rewriteBase.getScope().tryResolveField(node);
if (resolvedField.isPresent()) {
Copy link
Contributor

@sopel39 sopel39 Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be check state: checkState(resolvedField.isPresent()) as column references should be resolvable by corresponding scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they don't need to be. The check was here before and i think i tried to remove it..

Copy link
Contributor

@sopel39 sopel39 Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be strange if the if (resolvedField.isPresent()) { is needed for tests to pass. If the expression is column reference then the scope here should be the one that resolved it during analysis. Subuqery expressions should be translated already and lambda scope doesn't resolve column references at all.

If the scope here is different than the one that resolved column reference during analysis then it's potentially semantically incorrect. Such different scope might not resolve field, but it might also (incorrectly) resolve field to a wrong field index.

In any case, we need to investigate if this issue is introduced by this PR or was already in the code. It could be that the if (resolvedField.isPresent()) { is required and correct because it's just some edge case (e.g: part of ORDER BY scoping or some subquery is not supported and wasn't rewritten). Then we need to at least document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an edge-case related to order by planning in QueryPlanner, needs to stay

@sopel39
Copy link
Contributor

sopel39 commented Nov 2, 2017

This will be merged again after release is closed

@findepi
Copy link
Contributor Author

findepi commented Nov 2, 2017

@sopel39 , to facilitate (testing and) merging once release is out, i created a new PR: #9269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants