What would it take for type-checking linters to work with Pydra? #658

tclose · 2023-05-23T02:43:40Z

tclose
May 23, 2023
Maintainer

Might be a bit late in the day for this week's discussion but playing around with the shell and function task decorators has got me thinking about what it would take to make Pydra's type declarations lintable.

If we go with the proposal I put forward in #655, then the inputs and outputs would be defined in the attrs classes MyShellCommand.Inputs and MyShellCommand.Outputs. Seeing as attrs supports the "dataclass transform" pattern, its init and setattr methods are type-checked. So if users wanted to utilise type-checking in their code we could possibly offer the following syntax

task = MyShellCmd(name="mytask", inputs=MyShellCmd.Inputs(x="foo", y="bar",...))

If we do a bit of magic and insert a back-ref to the task class into the inputs in the @shell_task decorator then we could perhaps make it less verbose when adding a node to a workflow, e.g.

wf.add(
    name="mytask",
    inputs=MyShellCmd.Inputs(x="foo", y="bar",...)
)

The original syntax could continue to work as it is, but given VSCode+Pylance is now type-checking by default it will be a bit of a turn off if we can't offer a way to construct workflows not covered in red.

For function tasks we could perhaps offer the option of defining them as classes in a similar to ShellCommands, e.g.

@func_task
class ScaleIntensity(FunctionTask):
    @staticmethod
    def function(in_file: Nifti, scalar: float) -> Nifti:
        ...

    class Inputs:
        in_file: Nifti = func_arg(help_string="input file", default="foo")
        scalar: float = func_arg(help_string="scalar to scale the intensity of the image", default=2)
      
    class Outputs:
        out_file: Nifti = func_out(help_string="scaled image")

It is a lot heavier than the elegant "task" decorator, but it also allows us to set the help strings properly and harmonises the shell and function task definitions.

We could still offer offer the existing @task decorator as a shortcut for small helper functions and disable the type-checking by putting in a lot of typing.Any declarations. But for larger library-style functions (e.g. using dipy or scipy) it might be worth going to the extra effort to declare the inputs/outputs in a type-checkable way.

tclose · 2023-05-23T02:44:56Z

tclose
May 23, 2023
Maintainer Author

On a tangent, wf.add currently returns self, but I would find it more useful if it returned the newly added task, so its lzout interface can be accessed by subsequent tasks (in my pipeline construction logic I will often have a references to upstream node that will change target depending on conditionals). This duplicates naming the task, however, I'm wondering whether we could set a default value for the newly added tasks to just the snake-cased class name, as this is what I typically end up calling them in most cases unless I use the same task multiple times within a workflow, e.g.

my_shell_cmd = wf.add(
    inputs=MyShellCmd.Inputs(x="foo", y="bar",...)
)

assert my_shell_cmd == wf.my_shell_cmd

Returning the newly added tasks, should also make it possible (if I have thought it through properly) to get type-checking to work on lzout if we declared it to be of type *.Outputs. Although perhaps the workflow construction syntax would have to be the more verbose

task1 = wf.add(
    IdentityShell(inputs=IdentityShell.Inputs(in_file="/path/tofoo"))  # by templating the like Workflow.add(task: T) -> T the linter will know that task1 is of IdentityShell, whereas this wouldn't be possible with the current wf.identity_shell way of accessing the added task
)

task2 = wf.add(
    IdentityShell(inputs=IdentityShell.Inputs(in_file=task1.lzout.out_file))  # lzout would be declared as being of type AShell.Outputs despite being a white lie
)

3 replies

ghisvail May 23, 2023
Maintainer

One benefit of returning self is the possibility to define a workflow in a single expression by composing multiple add commands, i.e.

workflow = pydra.Workflow(...).add(SomeTask(...)).add(OtherTask(...))

which is more aligned with modern constructs seen in other PL, like the builder pattern in Java or object instantiation in Rust. I personnally like the way add works today, as it enables both imperative and expression-based workflow definitions. It sounds to me that what you propose would force imperative definions, which would be a regression in terms of ergonomics in my opinion.

Back to your example, would this work?

task1 = IdentityShell(inputs=IdentityShell.Inputs(in_file="/path/tofoo")) 
wf.add(task1)

task2 = IdentityShell(inputs=IdentityShell.Inputs(in_file=task1.lzout.out_file))
wf.add(task2)

lzout would be declared as being of type AShell.Outputs despite being a white lie

I am not sure how we can make type checkers happy with LazyField with the current design. It would be nice if we could somehow have something like LazyField[int] and get the type checker to erase it in favor of int somehow.

tclose May 23, 2023
Maintainer Author

One benefit of returning self is the possibility to define a workflow in a single expression by composing multiple add commands, i.e.
workflow = pydra.Workflow(...).add(SomeTask(...)).add(OtherTask(...))
which is more aligned with modern constructs seen in other PL, like the builder pattern in Java or object instantiation in Rust. I personnally like the way add works today, as it enables both imperative and expression-based workflow definitions. It sounds to me that what you propose would force imperative definions, which would be a regression in terms of ergonomics in my opinion.

Yes, I can see that logic and have run across the form before. I'm just not sure how much it will actually be used with Pydra in practice. I would personally find such chains pretty difficult to read I reckon.

Back to your example, would this work?
task1 = IdentityShell(inputs=IdentityShell.Inputs(in_file="/path/tofoo")) 
wf.add(task1)

task2 = IdentityShell(inputs=IdentityShell.Inputs(in_file=task1.lzout.out_file))
wf.add(task2)
lzout would be declared as being of type AShell.Outputs despite being a white lie

I am not sure how we can make type checkers happy with LazyField with the current design. It would be nice if we could somehow have something like LazyField[int] and get the type checker to erase it in favor of int somehow.

I was thinking we could just tell the type-checker that it is of type int instead of LazyField[int] by explicitly declaring the return of the lzout property to be MyTask.Outputs (whereas in reality it will return some other sort of Lzout object). But thinking about it some more, it might be pretty hard to do this without resorting to a pattern like that in the Stack Overflow post I shared

tclose May 23, 2023
Maintainer Author

For example, this seems to work

class LzOut:

    def __init__(self, outputs_cls):
        self.outputs_cls = outputs_cls

    def __getattr__(self, name):
        return name


@attrs.define
class BaseTask:

    @property
    def lzout(self):
        return LzOut(self.Outputs)  # pylint: disable=no-member


@attrs.define(auto_attribs=False)
class MyTask(BaseTask):

    @attrs.define
    class Inputs:
        x: int = attrs.field()
        y: float = attrs.field()

    @attrs.define
    class Outputs:
        out: float = attrs.field()

    inputs: Inputs = attrs.field()
    lzout: Outputs


mytask = MyTask(inputs=MyTask.Inputs(x=1, y=2.0))

mytask2 = MyTask(inputs=MyTask.Inputs(x=mytask.lzout.out, y=mytask.lzout.out))

tclose · 2023-05-23T02:46:49Z

tclose
May 23, 2023
Maintainer Author

Another small tangent, how would people feel about aliasing "help_string" with "help" to make it consistent with argparse and click? Also, I think at the moment it is mandatory, which in most cases I agree with but I'm wondering whether it is just adding unnecessary work if it is obvious from the field name what the field is, e.g. "in_file"

1 reply

ghisvail May 23, 2023
Maintainer

how would people feel about aliasing "help_string" with "help" to make it consistent with argparse and click?

I am used to it but won't object to renaming, assuming some compatibility layer is kept to avoid breakage.

Also, I think at the moment it is mandatory, which in most cases I agree with but I'm wondering whether it is just adding unnecessary work if it is obvious from the field name what the field is, e.g. "in_file"

It could at least be derived from the field name if none is provided. Say a field like interpolation_method could have its help string set to interpolation method as a default.

tclose · 2023-05-23T04:04:09Z

tclose
May 23, 2023
Maintainer Author

Maybe something like this could be useful https://stackoverflow.com/questions/74103528/type-hinting-an-instance-of-a-nested-class

1 reply

ghisvail May 23, 2023
Maintainer

That's what I had in mind for LazyField at least. Defining it as generic would allow runtime introspection of its inner type. See #657 (fixing #641) for an example where it would have been useful.

tclose · 2023-05-24T01:32:21Z

tclose
May 24, 2023
Maintainer Author

Playing around with this some more and integrating the suggestions in that stack overflow thread I mocked up this as an alternative syntax for creating workflows for people wanting the code to be type aware

Inputs = ty.TypeVar("Inputs")
Outputs = ty.TypeVar("Outputs")


@attrs.define(auto_attribs=False)
class BaseTask(ty.Generic[Inputs, Outputs]):

    @property
    def lzout(self) -> Outputs:
        return LzOut(self)  # type: ignore

    @classmethod
    def add_to(cls, wf: pydra.Workflow, inputs: Inputs, name: ty.Optional[str] = None) -> Self:
        if name is None:
            name = cls.__name__
        task = cls(inputs=inputs)
        wf.add(task)
        return task

    inputs: Inputs = attrs.field()


class LzOut:

    def __init__(self, task: BaseTask):
        self.task = task

    def __getattr__(self, name):
        # tp = getattr(attrs.fields(self.task.Inputs), name).type
        lf = LazyField(self.task, "output")
        lf.field = name
        return lf


class FunctionTask(BaseTask[Inputs, Outputs]):

    function: ty.Callable

    def __call__(self):
        return self.function(**attrs.asdict(self.inputs))


@dataclass_transform()
def func_task(klass):

    return attrs.define(auto_attribs=False, kw_only=True)(klass)


@func_task
class MyTask(FunctionTask["MyTask.Inputs", "MyTask.Outputs"]):

    @staticmethod
    def function(x: int, y: float) -> float:
        return x * y

    @attrs.define
    class Inputs:
        x: int = attrs.field()
        y: float = attrs.field()

    @attrs.define
    class Outputs:
        out: float = attrs.field()

    # Not necessary for mypy but needed for pyright type-checking
    inputs: Inputs
    lzout: Outputs


wf = pydra.Workflow(name="my_wf", input_spec=["x"])

task1 = MyTask.add_to(wf, inputs=MyTask.Inputs(x=wf.lzin.x, y='bad'))

task2 = MyTask.add_to(wf, inputs=MyTask.Inputs(x=task1.lzout.out, y=task1.lzout.out), name="task2")

a: int = task1.inputs.y

As in, mypy complains that the 'y' input to task1 is str instead of float, the 'x' input (i.e. coming from task1.lzout.out) is of type float instead of type int and the a complains that it is float to int

Having to use BaseTask["MyTask.Inputs", "MyTask.Outputs"] as the base class is a bit messy, might be better to just get the end user to define the inputs & lzout fields themselves

0 replies

tclose · 2023-05-24T05:34:08Z

tclose
May 24, 2023
Maintainer Author

Can't leave this alone, here is my latest attempt for a streamlined version, which I'm pretty happy with (note this could live side-by-side the existing workflow construction syntax)

T = ty.TypeVar("T")


@attrs.define
class InBase(ty.Generic[T]):

    @classmethod
    def task_type(cls):
        return cls.__args__[0]  # pylint: disable=no-member

    def attach(self, workflow: pydra.Workflow, name=None) -> T:
        task_type = self.task_type()
        if name is None:
            name = task_type.__name__
        task = task_type(inputs=self)
        workflow.add(task)
        return task


@attrs.define
class OutBase(ty.Generic[T]):
    pass


@attrs.define(auto_attribs=False)
class BaseTask:
    @property
    def lzout(self):
        return LzOut(self)

    inputs: InBase = attrs.field()
    lzout: OutBase  # type: ignore[no-redef]


class LzOut:
    def __init__(self, task: BaseTask):
        self.task = task

    def __getattr__(self, name):
        # tp = getattr(attrs.fields(self.task.In), name).type
        lf = LazyField(self.task, "output")
        lf.field = name
        return lf


class FunctionTask(BaseTask):

    function: ty.Callable

    def __call__(self):
        return self.function(**attrs.asdict(self.inputs))


@dataclass_transform()
def func_task(klass):

    return attrs.define(auto_attribs=False, kw_only=True, slots=False)(klass)


@func_task
class MyTask(FunctionTask):
    @staticmethod
    def function(x: int, y: float) -> float:
        return x * y

    @attrs.define
    class In(InBase["MyTask"]):
        x: int = attrs.field()
        y: float = attrs.field()

    @attrs.define
    class Out(OutBase["MyTask"]):
        out: float = attrs.field()

    inputs: In
    lzout: Out


wf = pydra.Workflow(name="my_wf", input_spec=["x"])

task1 = MyTask.In(x=wf.lzin.x, y="bad").attach(wf)

task2 = MyTask.In(x=task1.lzout.out, y=task1.lzout.out).attach(wf, name="task2")

a: int = task1.inputs.y

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What would it take for type-checking linters to work with Pydra? #658

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What would it take for type-checking linters to work with Pydra? #658

tclose May 23, 2023 Maintainer

Replies: 5 comments · 5 replies

tclose May 23, 2023 Maintainer Author

ghisvail May 23, 2023 Maintainer

tclose May 23, 2023 Maintainer Author

tclose May 23, 2023 Maintainer Author

tclose May 23, 2023 Maintainer Author

ghisvail May 23, 2023 Maintainer

tclose May 23, 2023 Maintainer Author

ghisvail May 23, 2023 Maintainer

tclose May 24, 2023 Maintainer Author

tclose May 24, 2023 Maintainer Author

tclose
May 23, 2023
Maintainer

Replies: 5 comments 5 replies

tclose
May 23, 2023
Maintainer Author

ghisvail May 23, 2023
Maintainer

tclose May 23, 2023
Maintainer Author

tclose May 23, 2023
Maintainer Author

tclose
May 23, 2023
Maintainer Author

ghisvail May 23, 2023
Maintainer

tclose
May 23, 2023
Maintainer Author

ghisvail May 23, 2023
Maintainer

tclose
May 24, 2023
Maintainer Author

tclose
May 24, 2023
Maintainer Author