Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NamedTuples for Chain + Parallel #1681

Merged
merged 21 commits into from
Aug 4, 2021
Merged

Support NamedTuples for Chain + Parallel #1681

merged 21 commits into from
Aug 4, 2021

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Jul 29, 2021

Closes #1680, WIP. Todo list includes:

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

ref #1682

@mcabbott
Copy link
Member Author

ref #1682

Oh. The goal of this one is to be transparent. Just as the Tuple is never visible, you just provide arguments to chain and then m[1] retrieves them, so the NamedTuple should be invisible, and m.name should just work.

@DhairyaLGandhi
Copy link
Member

Seems like this PR requires taking on a significant amount of complication. And a Chain is not a Tuple. The parallel to draw might be m[:key], as m[1]

A Chain is a struct, and the dot syntax denotes getting the fields (or properties) of the struct typically, but if it's desirable to have the dot syntax overloaded, one can define the following giving precedence to the field of the struct over that of the NamedTuple. Which is the expected outcome anyway.

julia> Base.getproperty(c::Chain, k::Symbol) = k == :layers ? Base.getfield(c, :layers) : Base.getproperty(Base.getfield(c, :layers), k)

Comment on lines +22 to +27
if obj isa Chain{<:NamedTuple} && children == getfield(obj, :layers)
# then we insert names -- can this be done more generically?
for k in Base.keys(obj)
_big_show(io, obj[k], indent+2, k)
end
elseif obj isa Parallel{<:Any, <:NamedTuple}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit ugly, but allows (and parses) this:

Chain(
  I = Dense(3, 5),                      # 20 parameters
  II = Parallel(
    vcat,
    α = Dense(5, 4),                    # 24 parameters
    β = Chain(
      i = Dense(5, 7),                  # 42 parameters
      ii = Dense(7, 4),                 # 32 parameters
    ),
  ),
  III = Dense(8, 17),                   # 153 parameters
)                   # Total: 10 arrays, 271 parameters, 1.840 KiB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great. It's definitely info dense, but that's a user choice (I assume the names here are not auto-inserted). I wouldn't name my layers so deep into the model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just to check that the nested printing doesn't try to impose good taste on you. If you don't provide names, then the storage is a Tuple as before, and none are printed.

I have also wondered whether the printing should somehow number the un-named layers, so that you can see what m[13] is going to give you, but didn't think of a nice way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in the original prototype in my head, I was going to force NamedTuples and auto-generated names like L$i. But I prefer the current route. In general, I think if you aren't going to name a layer, then you only care about iterating the group, not indexing specific ones. So support for the m[13] case probably wouldn't be a greatly missed feature.

@mcabbott
Copy link
Member Author

mcabbott commented Jul 29, 2021

one can define

Yes, but normally with ===, like here:

https://github.com/FluxML/Flux.jl/pull/1681/files#diff-b6c250c343270a4d87fe8ae8dc99e882c5a1b7464e516e60b4b28e14856fb202R43

parallel to draw might be m[:key], as m[1]

Yes, this works too.

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jul 29, 2021

So, if we want the dot-access functionality, I can add that to #1682 over this, and this PR can work with the printing then?

@mcabbott mcabbott changed the title Support NamedTuples for Chain Support NamedTuples for Chain + Parallel Jul 29, 2021
@DhairyaLGandhi
Copy link
Member

Yeah, I think I'm uncomfortable taking on the dependency here especially since a simpler solution exists.

@darsnack
Copy link
Member

Maybe remove all the getfield(c, :layers) to minimize code changes? At this point, c.layers will route through getproperty. In which case, I think #1681 and #1682 are the same approach.

@DhairyaLGandhi
Copy link
Member

In that case, let #1682 handle the constructor, and this PR can handle printing.

@mcabbott
Copy link
Member Author

mcabbott commented Jul 29, 2021

Maybe remove all the getfield(c, :layers) to minimize

Maybe! What I think needs a close look before merging is that this doesn't trigger any bad behaviour from Zygote. There are issues around getproperty and getfield and literal_getproperty, I don't have the latest status in my head, but it needs a careful check that this doesn't produce some huge regression.

let #1682 handle the constructor

Not sure this is such a huge change that it gains by being split across multiple PRs. What #1682 proposes to add which isn't added here is a constructor Chain(::NamedTuple). That does run at present, but doesn't produce anything useful, since tuples aren't callable. There is no corresponding Chain(::Tuple) constructor, because the Tuple is really an implementation detail. Instead the default constructor takes the layers directly, and the named constructor here does the same.

What could use some help is thinking up nasty test case where things might go wrong, or be ambiguous. Or, as above, thinking up ways to test for performance / inference issues.

@DhairyaLGandhi
Copy link
Member

Dot access doesn't seem terribly critical especially if the Chains keys are already accessible through the indexing syntax and avoids gotchas with getproperty. And the code is simpler in #1682. It would also help to remove overloads that aren't consistent between the tuple and named tuple case.

@mcabbott
Copy link
Member Author

Yes, I think dot access is a nice-to-have, not an essential feature. It will be trivial to remove if it does turn out to cause problems we can't solve.

src/layers/basic.jl Outdated Show resolved Hide resolved
@mcabbott mcabbott mentioned this pull request Aug 1, 2021
darsnack
darsnack previously approved these changes Aug 1, 2021
Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is good to go now that everything orthogonal will be handled by separate PRs.

darsnack
darsnack previously approved these changes Aug 4, 2021
@mcabbott
Copy link
Member Author

mcabbott commented Aug 4, 2021

Shall we see which of the overlapping sets of rules the robot follows?

bors r+

bors bot added a commit that referenced this pull request Aug 4, 2021
1681: Support NamedTuples for Chain + Parallel r=mcabbott a=mcabbott

Closes #1680, WIP. Todo list includes:

- [x] add Parallel too
- [ ] ~~worry about whether any of this will upset Zygote, like FluxML/Zygote.jl#909 or, kick that can down the road.
- [x] add tests

Co-authored-by: Michael Abbott <[email protected]>
@bors
Copy link
Contributor

bors bot commented Aug 4, 2021

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"1 review requesting changes and 1 approving review by reviewers with write access.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we then to add #1682 on top to clean up the implementation?

src/layers/basic.jl Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@darsnack
Copy link
Member

darsnack commented Aug 4, 2021

What's there to clean up? If you're talking about Chain((; a = ..., b = ...)) vs Chain(a = ..., b = ...), then I think the latter is more consistent with Flux's current behavior.

Co-authored-by: Dhairya Gandhi <[email protected]>
@DhairyaLGandhi
Copy link
Member

Well there are certain departures from Flux coding style in this PR (creating additional tuples, adding conditionals on keys etc), but #1682 extends the existing Chain implementation without needing the Tuples which adds some code noise which would be good to consolidate.

There's also kwarg only version in #1682. I think it's better to have the named tuple explicitly in there such that there's no ambiguity on the field-keys issue. Field access will always get you the field, and indexing will always fetch the key from the named tuple, making :layers a valid key in the named tuple. One must use indexing to access keys in a Chain always. Dot-Field access isn't ergonomic since it introduces the aforementioned ambiguity.

@darsnack
Copy link
Member

darsnack commented Aug 4, 2021

Well there are certain departures from Flux coding style in this PR (creating additional tuples, adding conditionals on keys etc), but #1682 extends the existing Chain implementation without needing the Tuples which adds some code noise which would be good to consolidate.

The conditional on the keys has been discussed a couple times above. It's not necessary, but it is an easy way to allow us add dot-syntax in the future without releasing a breaking change. It's not pretty, but it is a very straightforward bit of code that will make downstream features easier. We can always remove it without a breaking change too.

Re: Tuples, it's just a different way of making applychain work. If you want, this PR can easily be updated to use an additional dispatch path for NamedTuple like #1682. There's no need for a separate PR to add this.

There's also kwarg only version in #1682. I think it's better to have the named tuple explicitly in there such that there's no ambiguity on the field-keys issue. Field access will always get you the field, and indexing will always fetch the key from the named tuple, making :layers a valid key in the named tuple. One must use indexing to access keys in a Chain always. Dot-Field access isn't ergonomic since it introduces the aforementioned ambiguity.

None of that is in this PR anymore. I really don't see how this PR and #1682 are materially different. There appear to be only small differences like allowing Chain(::NamedTuple) as a constructor (could be added here but I don't see the benefit) and the applychain approach. I would say that feature-wise, this PR is strictly a superset of #1682.

@mcabbott
Copy link
Member Author

mcabbott commented Aug 4, 2021

creating additional tuples

You mean like Chain(+, +) creates an additional Tuple? That kind of "departures from Flux coding style"?

That "additional " tuple, an implementation detail hidden from the user, is precisely what this PR duplicates.

If you think this simple little mechanism needs multiple ways to invoke it, then you are of course welcome to argue for them, in follow-up issues or PRs or whatever. But they must stand on their own merits, and don't really block this one, which is the canonical simple obvious thing. Maybe there really has been a longstanding desire among some users to write Chain((Dense(1,2,tanh), Dense(2,3)))) with extra brackets in it.

Dot-Field access isn't ergonomic since it introduces the aforementioned ambiguity.

No, the ambiguity is explicitly avoided. Not one Chain in the wild has a layer named :layers, and none ever will, because an inner constructor forbids it.

Dot field access is pretty ergonomic, since you don't have to type stupid brackets with your pinkies. That's why NamedTuples support it, in addition to x[:y]. Lots of structs pass dot field access on to constituents which aren't fields, that's literally the reason getproperty was introduced.

It got removed here solely because of concerns that this may be a (temporary) Zygote performance hurdle. It would probably have been much quicker to investigate that, than whatever it is we're all doing here a week later.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Aug 4, 2021

The additional tuples that I am referring to is (c::Chain)(x) = applychain(Tuple(c.layers), x) which can be handled with a dispatch to applychain.

there really has been a longstanding desire among some users to write [...] with extra brackets in it.

But #1682 also has the kwarg only constructor, so I'm not sure if this is fair.

No, the ambiguity is explicitly avoided.

I think the argument was that there is a self consistent way to understand :layers if users insist. I guess if we are punting on :layers then 🤷

@DhairyaLGandhi DhairyaLGandhi dismissed their stale review August 4, 2021 16:18

Need to move to the next bit

@DhairyaLGandhi
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Aug 4, 2021
1681: Support NamedTuples for Chain + Parallel r=DhairyaLGandhi a=mcabbott

Closes #1680, WIP. Todo list includes:

- [x] add Parallel too
- [ ] ~~worry about whether any of this will upset Zygote, like FluxML/Zygote.jl#909 or, kick that can down the road.
- [x] add tests

Co-authored-by: Michael Abbott <[email protected]>
@mcabbott
Copy link
Member Author

mcabbott commented Aug 4, 2021

The additional tuples that I am referring to is (c::Chain)(x) = applychain(Tuple(c.layers), x) which can be handled with a dispatch.

Ok, I see Kyle was able to decode that. There are of course other ways to write it, but this seems like less characters than the others I tried. It emphasises that there is one path of functionality -- it's not dispatching to routines that do different things, just standardising.

@bors
Copy link
Contributor

bors bot commented Aug 4, 2021

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"At least 1 approving review is required by reviewers with write access.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

@darsnack
Copy link
Member

darsnack commented Aug 4, 2021

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 4, 2021

Build succeeded:

@bors bors bot merged commit dbb9f82 into FluxML:master Aug 4, 2021
@mcabbott mcabbott deleted the named branch August 4, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support NamedTuples for Container Layers
4 participants