-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add @byrow attempt 2 #250
Add @byrow attempt 2 #250
Conversation
Thank you for working on it. Note that CI fails. |
test/dataframes.jl
Outdated
y = [:v, :w, :x, :y, :z], | ||
c = [:g, :quote, :body, :transform, missing]) | ||
|
||
@test @transform(df, n = @byrow :i).n == df.i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some tests checking the whole output DataFrame
not only its specific columns.
Also do we support GroupedDataFrame
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated tests to be more reflective of the types of cases we are concerned with.
I will updated the GroupedDataFrame
tests once we think these are good. They are in a different file.
Okay I've added a very lengthy documentation section in Could use an approval of the tests, any comments on the API as discussed in |
docs/src/index.md
Outdated
@@ -268,6 +268,178 @@ df2 = @eachrow df begin | |||
end | |||
``` | |||
|
|||
## Row-wise transformations with `@byrow` | |||
|
|||
DataFrames provides the function-wrapper `ByRow`. `ByRow(f)(x, y)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what style you follow in DataFramesMeta.jl, but in DataFrames.jl we always add .jl to package names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added .jl
but will make a point to do this before 1.0.
docs/src/index.md
Outdated
``` | ||
|
||
!!! note | ||
Unlike `@.`, `@byrow` is not a "real" macro and cannot be used outside of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question is what are the parsing rules for @byrow
? does it take exactly one expression that follows it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included a more detailed discussion in updated docs. Hopefully it's clear.
docs/src/index.md
Outdated
### Comparison with `@eachrow` | ||
|
||
In previous versions of DataFramesMeta, `@eachrow` was named `@byrow`. | ||
This version of `@byrow` is deprecated, but the syntax can be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the syntax" - which syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified.
docs/src/index.md
Outdated
``` | ||
|
||
The function `*` is applied by-row. But the result of those operations | ||
is not stored in a new vector. Additionally, `@eachrow` and `@eachrow!` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is it stored then? (or not stored unless stored explicitly?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not stored. It's literally just a for
-loop. Hopefully the re-write is clear.
docs/src/index.md
Outdated
and broadcasts that function. Consequently, it does not broadcast | ||
objects that are referenced which are not columns. | ||
```julia | ||
@with df @byrow :x + [5, 6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this always error? What if column :x
would be a vector of two element vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have clarified in the example.
docs/src/index.md
Outdated
``` | ||
will error. On the other hand | ||
```julia | ||
@with df @. :x + [5, 6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will only work if :x
has 2 elements - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I have added more detail.
Thanks a ton for reading all this. I re-wrote a lot of this documentation for clarity based on your comments. Hopefully it's more complete now. |
It's worth noting that the special behavior related to |
docs/src/index.md
Outdated
used, functions do not take advantage of the grouping, so the | ||
behavior of `@transform(df, y = @byrow f(:x))` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this comment as with @combine
the result would be different:
used, functions do not take advantage of the grouping, so the | |
behavior of `@transform(df, y = @byrow f(:x))` and | |
used, functions do not take advantage of the grouping, so | |
for example the result of `@transform(df, y = @byrow f(:x))` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch. Too niche to mention in the docs though.
docs/src/index.md
Outdated
|
||
### Comparison with `@eachrow` | ||
|
||
In previous versions of DataFramesMeta, `@eachrow` was named `@byrow`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous versions of DataFramesMeta, `@eachrow` was named `@byrow`. | |
In previous versions of DataFramesMeta.jl, `@eachrow` was named `@byrow`. |
Thanks for the detailed review! |
Let us wait for @nalimilan to have a look at it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I've made comments mainly about the docs. Overall I think it would make sense to put less stress on the technical details to target "basic" users.
I also wonder whether @byrow z = :x * :y
shouldn't be allowed as it seems a bit simpler not to put @byrow
in the middle of the transformation. It's also more consistent with @byrow begin... end
. What do you think?
@@ -268,6 +268,228 @@ df2 = @eachrow df begin | |||
end | |||
``` | |||
|
|||
## Row-wise transformations with `@byrow` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention @byrow
at the top of the file?
Rather than starting the section with technical details, it would be more user-friendly to say what @byrow
does first, then show examples, and only then mention ByRow
and the fact that @byrow
isn't a real macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second paragraph still applies: it would be nice to start with a sentence or two saying that @byrow
allows writing code that is applied to each row instead of having to vectorize it.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
No, I don't think we should do this. It breaks the mapping to w.r.t. detailed technical docs vs introductory docs, if we were to make a docstring for EDIT: Unfortunately |
Well the mapping isn't 1-to-1 already: while the LHS corresponds to
Can't you define a dummy macro that throws an error? |
This is a good point. I will think about it. Maybe the implementation is easy.
Yeah that sounds good. I will do that and switch the docstrings around. |
Okay I'm going to allow
because with the
so it would be weird to allow that and not the same for multiple arguments. |
Okay two substantive changes to the API
I added docstrings to all the relevant macros (except Could use a review to go over the docstrings, I guess. But I'm getting anxious to merge this into master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
I have incorporated the requested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
This supercedes #239
Here is a docstring indicating how
@byrow
works@transform
uses the syntax@byrow
to wrap transformations inthe
ByRow
function wrapper from DataFrames, enabling broadcastingand more. For example, the call
becomes
a transformation which cannot be conveniently expressed
using broadcasting.
To avoid writing
@byrow
multiple times when performing multipletransformations by row,
@transform
allows@byrow
at thebeginning of a block of transformations. All transformations
in the block will operate by row.
The implementation is a bit clunky with
create_args_vector
returning atuple
of the vector of arguments and whether to wrap inByRow
. This should be less clunky after we clean up the@combine
code, which I will do in a later PR. So I accumulate a little bit of technical debt I promise I will pay off in the future.I have added tests for
@transform
, but not others, until we find all the edge cases we need. Overall I am pleased with this PR. I think the work in #245 made this really easy. It also preserves the snappy function parsing in #221, though I still have to add those tests.All things considered, this is ready for a review @nalimilan and @bkamins.