Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add names to args at invocation and allow omitting optional args #342

Merged
merged 1 commit into from
Nov 14, 2022

Conversation

westonpace
Copy link
Member

BREAKING CHANGE: optional arguments may now be omitted. However, if they are specified then they MUST be named.

@westonpace
Copy link
Member Author

Hopefully this breaking change won't be quite so contentious. I imagine there are other ways we could implement this as well (e.g. we could require all arguments to be named). Or, if we must, we could open the door to optional argument inference strategies (though I'd rather not go down that road). I'm open to alternative suggestions.

I'm filing this because it is becoming a problem in Acero. Acero currently expects all arguments (including optional arguments) to always be specified (I believe this is the validator's interpretation as well). Unfortunately, neither the Isthmus or DuckDb producers behave in this way.

As an example:

isthmus -c "CREATE TABLE T1(foo int)" "SELECT foo + 1 FROM T1"
{
  ...
          "expressions": [{
            "scalarFunction": {
              ...
              "arguments": [{"value": { ... }}, {"value": { ... }}], // Missing overflow
  ...

}

@westonpace
Copy link
Member Author

As an added bonus, and something that was brought up in the last community sync, this change should enable new optional arguments to be added to a function without breaking backwards compatibility.

@jacques-n
Copy link
Contributor

I'm strongly inclined to pull the options out of the arguments array. It's a bigger breaking change but seems cleaner. When @jvanstraten mentioned names on the sync, that was more what I was expecting.

@westonpace
Copy link
Member Author

I'm strongly inclined to pull the options out of the arguments array.

That's fine with me. I don't think it's any more work either way for us. Proto3 actually supports maps. Would we want to express the named options as a map then?

@jvanstraten
Copy link
Contributor

jvanstraten commented Sep 26, 2022

I'm strongly inclined to pull the options out of the arguments array. It's a bigger breaking change but seems cleaner. When @jvanstraten mentioned names on the sync, that was more what I was expecting.

+1, that's what I was intending, but I just hadn't gotten around to proposing something yet between all the reviewing and validator work. Likewise for lambdas by the way.

Longer-term, if we end up embracing the meta-type system that formed as part of my type derivation work, required enumeration arguments and type arguments could both be naturally expressed with a metavalue. You'd also get static boolean, integer, and string arguments for free, all usable within the type derivation system. They'd basically be the generalization of what most languages call template or generic arguments.

I'd then be inclined to just pull them into a different repeated as well, so you'd end up with something like

// A scalar function call.
message ScalarFunction {
  // Points to a function_anchor defined in this plan, which must refer
  // to a scalar function in the associated YAML file. Required; avoid
  // using anchor/reference zero.
  uint32 function_reference = 1;
  
  // Options to specify behavior for corner cases, or leave behavior unspecified
  // if not necessary.
  repeated FunctionOption options = 2;
  message FunctionOption {
    // Name of the option to set. If the consumer does not recognize the
    // option, it must reject the plan. The name is matched case-insensitively.
    string name = 1;
    
    // List of behavior options allowed by the producer. At least one must be
    // specified; to leave an option unspecified, simply don't add an entry to
    // `options`. The consumer must use the first option from the list that it
    // supports. If the consumer supports none of the specified options, it
    // must reject the plan. The name is matched case-insensitively.
    repeated string preference = 2;
  }
  
  // Generic parameters to configure the function statically. Must match the
  // signature of the function exactly.
  repeated Metavalue parameters = 3;
  
  // Runtime arguments passed to the function. The data types of the values
  // returned by the expressions must match the signature of the function
  // exactly. Constant arguments must be mapped to a Literal expression.
  repeated Expression arguments = 4;
  
  // Must be (redundantly) set to the return type of the function, exactly as
  // derived using the declaration in the extension.
  Type output_type = 5;
}

where Metavalue looks something like this, but without the validator-specific unresolved option.

(note on case insensitivity: I dislike it personally, but dislike inconsistency more)

Proto3 actually supports maps. Would we want to express the named options as a map then?

I'd prefer not to because A) I'm not sure how prost and other third-party protobuf implementations would deal with it (if at all) and B) it's essentially just sugar for

message MapFieldEntry {
  key_type key = 1;
  value_type value = 2;
}

repeated MapFieldEntry map_field = N;

anyway (as copypasted from the protobuf docs).


An important thing we need to change here still is that optional enumeration arguments need to be removed from the compound name signatures of functions, otherwise adding such arguments would still be a breaking change. Also it seems like the ability to specify multiple allowed values for an option that we discussed during the sync meeting is missing from the current proposal.

@westonpace westonpace force-pushed the feat/omit-optional-args branch 6 times, most recently from f4cda23 to 3addc63 Compare September 29, 2022 06:26
@westonpace
Copy link
Member Author

I've switched to the message format proposed by @jvanstraten and updated the spec around function signatures to specifically mention that options are not part of the signature.

jvanstraten
jvanstraten previously approved these changes Sep 29, 2022
Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation and semantics look good to me. I'm neutral about the documentation; I think it covers all the semantics but I think it might be easy to miss how different the various argument types are and behave, especially because the "argument type" concept doesn't translate well to other languages (at least none that I know). This is not necessarily a critique of this PR but more of the function docs in general though, and that can be improved later.

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation here is this should also include a change to the yaml schema where optional arguments are not specified in the same way as other argument types. I don't see that identified here nor do I see any change to the yaml schema.

proto/substrait/algebra.proto Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Oct 6, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@westonpace
Copy link
Member Author

My expectation here is this should also include a change to the yaml schema where optional arguments are not specified in the same way as other argument types. I don't see that identified here nor do I see any change to the yaml schema.

@jacques-n That was a significantly more extensive change. PTAL and let me know if this is what you were thinking.

@westonpace westonpace force-pushed the feat/omit-optional-args branch 2 times, most recently from 30197eb to 53c1c21 Compare October 14, 2022 06:28
@richtia
Copy link
Member

richtia commented Oct 14, 2022

@pdet

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction. Couple of couples...

extensions/functions_aggregate_generic.yaml Show resolved Hide resolved
extensions/functions_datetime.yaml Outdated Show resolved Hide resolved
site/docs/expressions/scalar_functions.md Outdated Show resolved Hide resolved
site/docs/expressions/scalar_functions.md Outdated Show resolved Hide resolved
site/docs/extensions/generate_function_docs.py Outdated Show resolved Hide resolved
site/docs/expressions/scalar_functions.md Outdated Show resolved Hide resolved
site/docs/expressions/scalar_functions.md Show resolved Hide resolved
@bkietz
Copy link
Contributor

bkietz commented Oct 19, 2022

Other than these nits, LGTM

}
reserved 2;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're breaking backwards compatibility anyway, let's simplify to:

message FunctionArgument {
  oneof arg_type {
    string enum = 1;
    Type type = 2;
    Expression value = 3;
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacques-n Any opinion on this suggestion? This would get rid of message Enum entirely which seems a lot simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud @jacques-n friendly ping. I feel this PR has stalled a bit and I would like to get it through. However, we should evaluate this carefully as I don't think we can easily revisit in the future. Any second opinions on this suggestion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud @jacques-n ping again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big change, but I am in favor of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this change.

BREAKING CHANGE: optional arguments are no longer allowed to be specified
as a part of FunctionArgument messages.  Instead they are now specified
separately as part of the function invocation.

BREAKING CHANGE: optional arguments are now specified separately from
required arguments in the YAML specification.

Co-authored-by: Benjamin Kietzman <[email protected]>
@westonpace
Copy link
Member Author

I was going to interpret this comment as a second SMC approval and merge this :)

However, it seems I cannot merge while a review is left in the "change requested" state. @jacques-n do you know if you will be able to look at this anytime soon?

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. @westonpace , any chance someone on your team can pick up the downstream changes for validator and java?

@westonpace
Copy link
Member Author

any chance someone on your team can pick up the downstream changes for validator and java?

I'll see what I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants