-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProcessFunction
: Add support for variadic arguments
#5691
ProcessFunction
: Add support for variadic arguments
#5691
Conversation
Hey @sphuber do you have a specific use case in mind for this? I would point to the zen python "explicit is better than implicit" and ask why can't we just allow for kwargs and avoid such implicit behaviour? |
See #1172 . Apparently there were multiple use-cases from users. I don't have one personally but I can definitely see the point. I don't see a strong reason not to allow it. I mean, Python itself supports variadic arguments, so why would we prohibit it? And finally, users are not forced to use it. If they prefer to be more explicit with positional and keyword arguments, they can do so without any issues. Together with #5688 to support automatic base type input serialization, I think process functions can become even easier to use. You would be able to
|
By the way, once agreed upon the interface, I will update the documentation. |
a902244
to
afe4e74
Compare
This is not a use case to use varadic arguments though, its an argument to use a list of nodes an input, and similarly we should make it easy to use a dict of nodes as an input. There is an obvious way how to specify in the function construction, what the inputs should be (also allowing for type validation): type annotations: @calcfunction
def sum_integers(integers: list[orm.Int]) -> orm.Int:
return orm.Int(sum(*(i.value for i in integers))) You could even extend this kind of type validation further in the future with: https://docs.python.org/3/library/typing.html#typing.Annotated Secondly, with the use of input/output lists and dicts, it should be easier to "reconstruct" them. I think it might be beneficial to add a metadata colmn, where you could signify that this link is part of a list or dict, and the index/key for that list/dict |
Thanks for working on this! I spent quite some time banging my hand against this restriction. 🤕
Just as an example of my use case, functionality like this is needed if the number of inputs is not known beforehand. Right now instead of simple calcfunction I need to use a WorkChain with an input with dynamic namespace. Here's an example where I simply want to combine multiple StructureData node into a single TrajectoryData node. (happy to hear if there's an easier way) |
Thanks for the input @danielhollas . You could in principle use @calcfunction
def structures_trajectory(**structures):
return TrajectoryData([structure for structure in structures.values()])
structures = {
'structure_a': StructureData(),
'structure_b': StructureData(),
'structure_c': StructureData(),
}
structures_trajectory(**structures) With the proposed new syntax, one could do the following: @calcfunction
def structures_trajectory(*structures):
return TrajectoryData(structures)
structures = (StructureData(), StructureData())
structures_trajectory(*structures) I personally think both are fine. As you can see the current existing solution is quite reasonable, it is just that quite a number of users were stumped to find out that varargs are not supported when that is possible in normal Python. I think it couldn't hurt to support the option for those who want it. |
3c8b81f
to
b7824c4
Compare
I agree with Chris that the underlying feature people are looking for here is not so much variadic arguments but being able to work with, and seamlessly pass around compounds (lists, dicts, ...) of AiiDA objects. That the provenance graph does not have a concept of a list of objects (or, in other words, that you can't have an As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above. |
@ltalirz I agree that ideally lists and dictionaries of inputs would be supported, but it is not clear how much work it would take to implement this. Is there a real objection to add this functionality in the meantime? Is there a real downside? Adding this now doesn't prevent us from adding direct support for lists and dictionaries later on, would it? |
b7824c4
to
571577f
Compare
My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change. An alternative stop-gap solution would be to replace the error message
Anyhow, I'm not going to block the PR if people feel strongly that support for |
Yeh this was kinda my feeling also 😬 |
Oh cool! I didn't know that! Thanks so much @sphuber Nevertheless, I'd say it is still weird to use a dictionary for passing in a list, imo weirder than having surprising link labels, but I don't know enough about how the link labels are generally used by users. In any case,
This would go a long way tbh, please implement this regardless of the result of the discussion here. 😊 |
I'd also say that calcfunctions, workfunctions are so much more lightweight than CalcJobs / WorkChains, so making them as useful and as painless to use is a goal worth pursuing. |
For me it is also not crucial to have this functionality, I have personally never felt the need, but that is probably because I, as a developer, was aware of the limitation and the alternatives. I simply picked up the existing issue since I worked on the process functions anyway (for the automatic input serialization) and since there seemed to be multiple people asking for it, I just implemented it.
This was exactly my thinking. In most cases, people don't even really care about the generated link labels, but in this case, having Anyway, I think all has been said what needs to be said. I propose I quickly present this during the next AiiDA meeting and then we can see what other people think. If they share your concern, then I am perfectly fine with leaving it as is (with an update of the docs explaining this use-case and pointing to the |
Thanks @sphuber, and all for the feedback. Here is my take.
So, I would be OK with merging this, with one change: I don't like names being created with UUID and in a non-reproducible/understandable way. I suggest that if a potential name clash is found (e.g. one has |
571577f
to
22c59dc
Compare
@@ -103,6 +99,24 @@ Note that the inputs **have to be passed as keyword arguments** because they are | |||
If the inputs would simply have been passed as positional arguments, the engine could have impossibly determined what label to use for the links that connect the input nodes with the calculation function node. | |||
For this reason, invoking a 'dynamic' function, i.e. one that supports ``**kwargs`` in its signature, with more positional arguments that explicitly named in the signature, will raise a ``TypeError``. | |||
|
|||
.. versionadded:: 2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version should be updated?
Thanks! Please udpate the versionadded (see my other comment) and also the main PR message (it still mentions UUIDs). Then consider it approved by me! |
Up till now, variadic arguments, i.e., arguments defined as `*args` in a function signature which collects any remaining positional arguments, were not supported for process functions. The main reason was that it wasn't immediately clear what the link label should be for these inputs. For normal positional arguments we can take the name of the argument declaration in the function signature, and for keyword arguments we take the keyword with which the argument is passed in the function invocation. But for variadic arguments there is no specific argument name, not in the function signature, nor in the function invocation. However, we can simply create a link label. We just have to ensure that it doesn't clash with the link labels that will be generated for the positional and keyword arguments. Here the link label will be determined with the following format: `{label_prefix}_{index}` The `label_prefix` is determined the name of the variadic argument. If a function is declared as `function(a, *args, **kwargs)` the prefix will be equal to `args` and if it is `function(*some_var_args)` it will be `some_var_args`. The index will simply be the zero-base index of the argument within the variadic arguments tuple. This would therefore give link labels `args_0`, `args_1` etc. in the first example. If there would be a clash of labels, for example with the function def: def function(args_0, *args): which when invoked as: function(1, *(2, 3)) would generate the labels `args_0` for the first positional argument, but also `args_0` for the first variadic argument. This clash is detected and a `RuntimeError` is raised instructing the user to fix it.
22c59dc
to
36118c8
Compare
Fixes #1172
Up till now, variadic arguments, i.e., arguments defined as
*args
in afunction signature which collects any remaining positional arguments,
were not supported for process functions. The main reason was that it
wasn't immediately clear what the link label should be for these inputs.
For normal positional arguments we can take the name of the argument
declaration in the function signature, and for keyword arguments we take
the keyword with which the argument is passed in the function invocation.
But for variadic arguments there is no specific argument name, not in
the function signature, nor in the function invocation. However, we can
simply create a link label. We just have to ensure that it doesn't clash
with the link labels that will be generated for the positional and
keyword arguments.
Here the link label will be determined with the following format:
The
label_prefix
is determined the name of the variadic argument. If afunction is declared as
function(a, *args, **kwargs)
the prefix willbe equal to
args
and if it isfunction(*some_var_args)
it will besome_var_args
. The index will simply be the zero-base index of theargument within the variadic arguments tuple. This would therefore give
link labels
args_0
,args_1
etc. in the first example.If there would be a clash of labels, for example with the function def:
which when invoked as:
would generate the labels
args_0
for the first positional argument,but also
args_0
for the first variadic argument. This clash isdetected and a
RuntimeError
is raised instructing the user to fix it.