-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add @byrow attempt 2 #250
Add @byrow attempt 2 #250
Changes from 8 commits
dc88ed8
33248f7
2112f4c
d61df1a
d3e401b
eab9743
f4be9f0
61d96f4
480ef54
d9662f0
e0e1307
e85b44c
ae5a399
bf6f876
1c426a1
c3c6454
c8a4ae7
e9d110a
08df117
216dbe0
b75f3c2
288f192
ebf6189
710b53a
eac79e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -268,6 +268,178 @@ df2 = @eachrow df begin | |||||
end | ||||||
``` | ||||||
|
||||||
## Row-wise transformations with `@byrow` | ||||||
|
||||||
DataFrames provides the function-wrapper `ByRow`. `ByRow(f)(x, y)` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure what style you follow in DataFramesMeta.jl, but in DataFrames.jl we always add .jl to package names There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||||||
is roughly equivalent to `f.(x, y)`, with a few exceptions discussed below. | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
DataFramesMeta allows for users to construct expressions using `ByRow` | ||||||
function wrapper with the syntax `@byrow`. | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```julia | ||||||
@transform(df, y = @byrow :x == 1 ? "true" : "false) | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
``` | ||||||
|
||||||
becomes | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
``` | ||||||
transform(df, :x => ByRow(x -> x == 1 ? "true", "false") => :y) | ||||||
``` | ||||||
|
||||||
!!! note | ||||||
Unlike `@.`, `@byrow` is not a "real" macro and cannot be used outside of | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the question is what are the parsing rules for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I included a more detailed discussion in updated docs. Hopefully it's clear. |
||||||
DataFramesMeta macros. However it's behavior within DataFramesMeta | ||||||
macros should be indistinguishable from externally defined macros. | ||||||
|
||||||
### Comparison with `@eachrow` | ||||||
|
||||||
In previous versions of DataFramesMeta, `@eachrow` was named `@byrow`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
This version of `@byrow` is deprecated, but the syntax can be used | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "the syntax" - which syntax? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clarified. |
||||||
to for similar, but not identical, behavior. | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The syntax | ||||||
|
||||||
```julia | ||||||
@eachrow df begin | ||||||
:a * :b | ||||||
end | ||||||
``` | ||||||
|
||||||
is similar to | ||||||
|
||||||
```julia | ||||||
begin | ||||||
function tempfun(a, b) | ||||||
for i in eachindex(a) | ||||||
a[i] * b[i] | ||||||
end | ||||||
end | ||||||
tempfun(df.a, df.b) | ||||||
df | ||||||
end | ||||||
``` | ||||||
|
||||||
The function `*` is applied by-row. But the result of those operations | ||||||
is not stored in a new vector. Additionally, `@eachrow` and `@eachrow!` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is it stored then? (or not stored unless stored explicitly?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's not stored. It's literally just a |
||||||
return data frames. | ||||||
|
||||||
By contrast, | ||||||
|
||||||
```julia | ||||||
@with df @byrow begin | ||||||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
:a * :b | ||||||
end | ||||||
``` | ||||||
|
||||||
is similar to | ||||||
|
||||||
```julia | ||||||
tempfun(a, b) = a * b | ||||||
tempfun.(df.a, df.b) | ||||||
``` | ||||||
|
||||||
`@with` combined with `@byrow` will return a vector of the | ||||||
broadcasted multiplication and not a data frame. | ||||||
|
||||||
Additionally, `@eachrow` and `@eachrow!` allow modifying a data | ||||||
data frame. Just as with Base Julia broadcasting, `@byrow` will | ||||||
not update columns. | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
``` | ||||||
julia> df = DataFrame(a = [1, 2], b = [3, 4]); | ||||||
|
||||||
julia> @with df @byrow begin | ||||||
:a = 500 | ||||||
end | ||||||
2-element Vector{Int64}: | ||||||
500 | ||||||
500 | ||||||
|
||||||
julia> df | ||||||
2×2 DataFrame | ||||||
Row │ a b | ||||||
│ Int64 Int64 | ||||||
─────┼────────────── | ||||||
1 │ 1 3 | ||||||
2 │ 2 4 | ||||||
``` | ||||||
|
||||||
### Comparison with `@.` and Base broadcasting | ||||||
|
||||||
Base Julia provides the broadasting macro `@.` and in many cases `@.` | ||||||
and `@byrow` will give equivalent results. But there are important | ||||||
deviations in behavior. Consider the setup | ||||||
|
||||||
```julia | ||||||
df = DataFrame(a = [1, 2], b = [3, 4]) | ||||||
``` | ||||||
|
||||||
* Control flow. In all versions of Julia, expressions of the form | ||||||
`if...else`, `a ? b : c` cannot be broadcasted. In versions below | ||||||
1.7-dev, expressions of the form `a && b` and `a || b` cannot be | ||||||
broadcasted. Consequently, the `@.` macro will fail when encountering such | ||||||
control flow while `@byrow` will not. | ||||||
``` | ||||||
julia> @with df @byrow begin | ||||||
if :a == 1 | ||||||
5 | ||||||
else | ||||||
10 | ||||||
end | ||||||
end | ||||||
2-element Vector{Int64}: | ||||||
5 | ||||||
10 | ||||||
|
||||||
julia> @with df @. begin | ||||||
if :a == 1 | ||||||
5 | ||||||
else | ||||||
10 | ||||||
end | ||||||
end # will error | ||||||
``` | ||||||
|
||||||
* Broadcasting objects that are not columns. `@byrow` constructs an | ||||||
anonymous function *which accepts only the columns of the dataframe* | ||||||
and broadcasts that function. Consequently, it does not broadcast | ||||||
objects that are referenced which are not columns. | ||||||
```julia | ||||||
@with df @byrow :x + [5, 6] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will this always error? What if column There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I have clarified in the example. |
||||||
``` | ||||||
will error. On the other hand | ||||||
```julia | ||||||
@with df @. :x + [5, 6] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will only work if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. I have added more detail. |
||||||
``` | ||||||
will not. | ||||||
|
||||||
* Broadcasting expensive calls. In Base Julia, broadcastsing | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
evaluates calls first and then broadcasts the result. Because | ||||||
`@byrow` constructs an anonymous function and evaluates | ||||||
that function for every row in the DataFrame, expensive functions | ||||||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
will be evaluated many times. | ||||||
```julia | ||||||
julia> function expensive() | ||||||
sleep(.5) | ||||||
return 1 | ||||||
end; | ||||||
|
||||||
julia> @time @with df @byrow :a + expensive(); | ||||||
1.037073 seconds (51.67 k allocations: 3.035 MiB, 3.19% compilation time) | ||||||
|
||||||
julia> @time @with df :a .+ expensive(); | ||||||
0.539900 seconds (110.67 k allocations: 6.525 MiB, 7.05% compilation time) | ||||||
|
||||||
``` | ||||||
This problem comes up when using the `@.` macro as well, but can easily be fixed with `$`. | ||||||
```julia | ||||||
julia> @time @with df @. :a + expensive(); | ||||||
1.036888 seconds (97.55 k allocations: 5.617 MiB, 3.20% compilation time) | ||||||
|
||||||
julia> @time @with df @. :a + $expensive(); | ||||||
0.537961 seconds (110.68 k allocations: 6.525 MiB, 6.73% compilation time) | ||||||
``` | ||||||
No such solution currently exists with `@byrow`. | ||||||
|
||||||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
## Working with column names programmatically with `cols` | ||||||
|
||||||
DataFramesMeta provides the special syntax `cols` for referring to | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention
@byrow
at the top of the file?Rather than starting the section with technical details, it would be more user-friendly to say what
@byrow
does first, then show examples, and only then mentionByRow
and the fact that@byrow
isn't a real macro.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second paragraph still applies: it would be nice to start with a sentence or two saying that
@byrow
allows writing code that is applied to each row instead of having to vectorize it.