`ProcessFunction`: Add support for variadic arguments #5691

sphuber · 2022-10-06T09:16:22Z

Up till now, variadic arguments, i.e., arguments defined as *args in a
function signature which collects any remaining positional arguments,
were not supported for process functions. The main reason was that it
wasn't immediately clear what the link label should be for these inputs.

For normal positional arguments we can take the name of the argument
declaration in the function signature, and for keyword arguments we take
the keyword with which the argument is passed in the function invocation.
But for variadic arguments there is no specific argument name, not in
the function signature, nor in the function invocation. However, we can
simply create a link label. We just have to ensure that it doesn't clash
with the link labels that will be generated for the positional and
keyword arguments.

Here the link label will be determined with the following format:

`{label_prefix}_{index}`

The label_prefix is determined the name of the variadic argument. If a
function is declared as function(a, *args, **kwargs) the prefix will
be equal to args and if it is function(*some_var_args) it will be
some_var_args. The index will simply be the zero-base index of the
argument within the variadic arguments tuple. This would therefore give
link labels args_0, args_1 etc. in the first example.

If there would be a clash of labels, for example with the function def:

def function(args_0, *args):

which when invoked as:

function(1, *(2, 3))

would generate the labels args_0 for the first positional argument,
but also args_0 for the first variadic argument. This clash is
detected and a RuntimeError is raised instructing the user to fix it.

chrisjsewell · 2022-10-06T09:22:21Z

Hey @sphuber do you have a specific use case in mind for this?

I would point to the zen python "explicit is better than implicit" and ask why can't we just allow for kwargs and avoid such implicit behaviour?

sphuber · 2022-10-06T09:25:57Z

Hey @sphuber do you have a specific use case in mind for this?

See #1172 . Apparently there were multiple use-cases from users. I don't have one personally but I can definitely see the point. I don't see a strong reason not to allow it. I mean, Python itself supports variadic arguments, so why would we prohibit it?

And finally, users are not forced to use it. If they prefer to be more explicit with positional and keyword arguments, they can do so without any issues.

Together with #5688 to support automatic base type input serialization, I think process functions can become even easier to use. You would be able to

@calcfunction
def sum_integers(*args):
    return sum(args)

assert sum_integers(1, 2, 3) == 6

sphuber · 2022-10-06T09:32:35Z

By the way, once agreed upon the interface, I will update the documentation.

chrisjsewell · 2022-10-06T13:13:35Z

See #1172 . Apparently there were multiple use-cases from users.

This is not a use case to use varadic arguments though, its an argument to use a list of nodes an input, and similarly we should make it easy to use a dict of nodes as an input.

There is an obvious way how to specify in the function construction, what the inputs should be (also allowing for type validation): type annotations:

@calcfunction
def sum_integers(integers: list[orm.Int]) -> orm.Int:
    return orm.Int(sum(*(i.value for i in integers)))

You could even extend this kind of type validation further in the future with: https://docs.python.org/3/library/typing.html#typing.Annotated

Secondly, with the use of input/output lists and dicts, it should be easier to "reconstruct" them.
At present the link table of AiiDA is quite restrictive: https://github.com/aiidateam/aiida-core/blob/1c5dfef09d18895ce327b4b145466373f9f05118/aiida/storage/psql_dos/models/node.py#L149

I think it might be beneficial to add a metadata colmn, where you could signify that this link is part of a list or dict, and the index/key for that list/dict

danielhollas · 2022-10-07T00:53:09Z

Thanks for working on this! I spent quite some time banging my hand against this restriction. 🤕

Hey @sphuber do you have a specific use case in mind for this?

Just as an example of my use case, functionality like this is needed if the number of inputs is not known beforehand. Right now instead of simple calcfunction I need to use a WorkChain with an input with dynamic namespace. Here's an example where I simply want to combine multiple StructureData node into a single TrajectoryData node.

https://github.com/danielhollas/aiidalab-ispg/blob/13f294438422a4eccfe2647baca621860d2416fd/workflows/aiidalab_atmospec_workchain/__init__.py#L55

(happy to hear if there's an easier way)

sphuber · 2022-10-07T06:30:49Z

Thanks for the input @danielhollas . You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData. Something like this:

@calcfunction
def structures_trajectory(**structures):
    return TrajectoryData([structure for structure in structures.values()])

structures = {
    'structure_a': StructureData(),
    'structure_b': StructureData(),
    'structure_c': StructureData(),
}

structures_trajectory(**structures)

With the proposed new syntax, one could do the following:

@calcfunction
def structures_trajectory(*structures):
    return TrajectoryData(structures)

structures = (StructureData(), StructureData())

structures_trajectory(*structures)

I personally think both are fine. As you can see the current existing solution is quite reasonable, it is just that quite a number of users were stumped to find out that varargs are not supported when that is possible in normal Python. I think it couldn't hurt to support the option for those who want it.

ltalirz · 2022-10-07T10:50:20Z

I agree with Chris that the underlying feature people are looking for here is not so much variadic arguments but being able to work with, and seamlessly pass around compounds (lists, dicts, ...) of AiiDA objects.

That the provenance graph does not have a concept of a list of objects (or, in other words, that you can't have an orm.List of AiiDA objects) is quite a significant limitation of the AiiDA language that seems worth tackling.

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

sphuber · 2022-10-07T11:08:42Z

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

@ltalirz
I think as the various responses from users testify, using varargs is something that people would naturally do. Even if it is documented somewhere that this is not supported with a counter-example based on kwargs, people are unlikely to find this beforehand and not run into this limitation.

I agree that ideally lists and dictionaries of inputs would be supported, but it is not clear how much work it would take to implement this. Is there a real objection to add this functionality in the meantime? Is there a real downside? Adding this now doesn't prevent us from adding direct support for lists and dictionaries later on, would it?

ltalirz · 2022-10-07T13:24:03Z

Is there a real objection to add this functionality in the meantime?

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

Variadic arguments are not supported. Instead of a list `*args`, pass a dictionary `**kwargs`, whose keys will be used as link labels.

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

chrisjsewell · 2022-10-07T13:25:57Z

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

Yeh this was kinda my feeling also 😬

danielhollas · 2022-10-07T14:13:15Z

You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData

Oh cool! I didn't know that! Thanks so much @sphuber Nevertheless, I'd say it is still weird to use a dictionary for passing in a list, imo weirder than having surprising link labels, but I don't know enough about how the link labels are generally used by users. In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

This would go a long way tbh, please implement this regardless of the result of the discussion here. 😊

danielhollas · 2022-10-07T14:15:50Z

I'd also say that calcfunctions, workfunctions are so much more lightweight than CalcJobs / WorkChains, so making them as useful and as painless to use is a goal worth pursuing.

sphuber · 2022-10-07T15:08:19Z

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

For me it is also not crucial to have this functionality, I have personally never felt the need, but that is probably because I, as a developer, was aware of the limitation and the alternatives. I simply picked up the existing issue since I worked on the process functions anyway (for the automatic input serialization) and since there seemed to be multiple people asking for it, I just implemented it.

In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

This was exactly my thinking. In most cases, people don't even really care about the generated link labels, but in this case, having args_0 etc. seemed perfectly logical to me. The exception in the case of a clash should indeed be super rare and practically never occur, but I put in the code to cover all the bases and not have exceptions in those rare cases.

Anyway, I think all has been said what needs to be said. I propose I quickly present this during the next AiiDA meeting and then we can see what other people think. If they share your concern, then I am perfectly fine with leaving it as is (with an update of the docs explaining this use-case and pointing to the **kwargs usage).

giovannipizzi · 2022-10-19T09:24:32Z

Thanks @sphuber, and all for the feedback. Here is my take.

There are two issues.
- One is allowing this behaviour (I see why people want it, there is an example above where this is useful and we don't really care of the label so I would support this PR (with the comment/change below)
- Then, there is the issue of "list of AiiDA nodes" and "dicts of AiiDA nodes". I totally see the need, but this is more complex to fix (and only partially overlapping - in the case of @danielhollas, I think he would be fine with this PR and does not really need to define a node being the list of multiple StructureData nodes). Other cases instead would require this concept to be in the graph and immutable. So for this latter case, we should open another issue and investigate what needs to be changed (e.g., one of the many possible approaches: create a new new node type "DictOfNodes", and define a new link type "MemberOf" between data nodes - but we should really discuss in another issue).

So, I would be OK with merging this, with one change: I don't like names being created with UUID and in a non-reproducible/understandable way. I suggest that if a potential name clash is found (e.g. one has *args plus any other argument starting with args_ followed by a positive integer), we just raise (similar to what we are now doing with the message Variadic arguments are not supported), explaining the limitation. This should be enough to avoid unexpected behaviour, and at the same point to have the developed fix this at their first use of the (calc/work)function they just wrote.
Probably nobody will ever see this exception, but if one sees it, it's a 5-second thing to change the variable name accordingly.

giovannipizzi · 2022-12-14T10:19:33Z

docs/source/topics/processes/functions.rst

@@ -103,6 +99,24 @@ Note that the inputs **have to be passed as keyword arguments** because they are
 If the inputs would simply have been passed as positional arguments, the engine could have impossibly determined what label to use for the links that connect the input nodes with the calculation function node.
 For this reason, invoking a 'dynamic' function, i.e. one that supports ``**kwargs`` in its signature, with more positional arguments that explicitly named in the signature, will raise a ``TypeError``.

+.. versionadded:: 2.1


This version should be updated?

giovannipizzi · 2022-12-14T10:21:08Z

Thanks! Please udpate the versionadded (see my other comment) and also the main PR message (it still mentions UUIDs). Then consider it approved by me!

Up till now, variadic arguments, i.e., arguments defined as `*args` in a function signature which collects any remaining positional arguments, were not supported for process functions. The main reason was that it wasn't immediately clear what the link label should be for these inputs. For normal positional arguments we can take the name of the argument declaration in the function signature, and for keyword arguments we take the keyword with which the argument is passed in the function invocation. But for variadic arguments there is no specific argument name, not in the function signature, nor in the function invocation. However, we can simply create a link label. We just have to ensure that it doesn't clash with the link labels that will be generated for the positional and keyword arguments. Here the link label will be determined with the following format: `{label_prefix}_{index}` The `label_prefix` is determined the name of the variadic argument. If a function is declared as `function(a, *args, **kwargs)` the prefix will be equal to `args` and if it is `function(*some_var_args)` it will be `some_var_args`. The index will simply be the zero-base index of the argument within the variadic arguments tuple. This would therefore give link labels `args_0`, `args_1` etc. in the first example. If there would be a clash of labels, for example with the function def: def function(args_0, *args): which when invoked as: function(1, *(2, 3)) would generate the labels `args_0` for the first positional argument, but also `args_0` for the first variadic argument. This clash is detected and a `RuntimeError` is raised instructing the user to fix it.

sphuber requested review from ltalirz and unkcpz October 6, 2022 09:16

sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from a902244 to afe4e74 Compare October 6, 2022 12:36

sphuber force-pushed the main branch from 1c5dfef to 41e9bf3 Compare October 7, 2022 06:34

sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from 3c8b81f to b7824c4 Compare October 7, 2022 10:04

sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from b7824c4 to 571577f Compare October 7, 2022 12:47

danielhollas mentioned this pull request Oct 7, 2022

Boltzmann weighting in SpectrumWidget ispg-group/aiidalab-ispg#42

Closed

4 tasks

sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from 571577f to 22c59dc Compare December 7, 2022 12:21

giovannipizzi reviewed Dec 14, 2022

View reviewed changes

sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from 22c59dc to 36118c8 Compare December 14, 2022 11:13

sphuber merged commit 68cd30d into aiidateam:main Dec 14, 2022

sphuber deleted the feature/1172/process-function-variadic-arguments branch December 14, 2022 11:37

danielhollas mentioned this pull request Mar 13, 2024

Calcfunctions do not properly handle *args #6309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ProcessFunction`: Add support for variadic arguments #5691

`ProcessFunction`: Add support for variadic arguments #5691

sphuber commented Oct 6, 2022 •

edited

Loading

chrisjsewell commented Oct 6, 2022

sphuber commented Oct 6, 2022 •

edited

Loading

sphuber commented Oct 6, 2022

chrisjsewell commented Oct 6, 2022 •

edited

Loading

danielhollas commented Oct 7, 2022 •

edited

Loading

sphuber commented Oct 7, 2022 •

edited

Loading

ltalirz commented Oct 7, 2022 •

edited

Loading

sphuber commented Oct 7, 2022

ltalirz commented Oct 7, 2022 •

edited

Loading

chrisjsewell commented Oct 7, 2022

danielhollas commented Oct 7, 2022 •

edited

Loading

danielhollas commented Oct 7, 2022

sphuber commented Oct 7, 2022

giovannipizzi commented Oct 19, 2022

giovannipizzi Dec 14, 2022

giovannipizzi commented Dec 14, 2022

ProcessFunction: Add support for variadic arguments #5691

ProcessFunction: Add support for variadic arguments #5691

Conversation

sphuber commented Oct 6, 2022 • edited Loading

chrisjsewell commented Oct 6, 2022

sphuber commented Oct 6, 2022 • edited Loading

sphuber commented Oct 6, 2022

chrisjsewell commented Oct 6, 2022 • edited Loading

danielhollas commented Oct 7, 2022 • edited Loading

sphuber commented Oct 7, 2022 • edited Loading

ltalirz commented Oct 7, 2022 • edited Loading

sphuber commented Oct 7, 2022

ltalirz commented Oct 7, 2022 • edited Loading

chrisjsewell commented Oct 7, 2022

danielhollas commented Oct 7, 2022 • edited Loading

danielhollas commented Oct 7, 2022

sphuber commented Oct 7, 2022

giovannipizzi commented Oct 19, 2022

giovannipizzi Dec 14, 2022

Choose a reason for hiding this comment

giovannipizzi commented Dec 14, 2022

`ProcessFunction`: Add support for variadic arguments #5691

`ProcessFunction`: Add support for variadic arguments #5691

sphuber commented Oct 6, 2022 •

edited

Loading

sphuber commented Oct 6, 2022 •

edited

Loading

chrisjsewell commented Oct 6, 2022 •

edited

Loading

danielhollas commented Oct 7, 2022 •

edited

Loading

sphuber commented Oct 7, 2022 •

edited

Loading

ltalirz commented Oct 7, 2022 •

edited

Loading

ltalirz commented Oct 7, 2022 •

edited

Loading

danielhollas commented Oct 7, 2022 •

edited

Loading