-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rename macro, docs, and tests #343
Conversation
690c49d
to
d6f8dcd
Compare
src/macros.jl
Outdated
* `::AbstractDataFrame` | ||
|
||
Inputs to `@rename` can come in two formats: a `begin ... end` block, or as a series of | ||
arguments and keyword-like arguments. For example, the following are equivalent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not clear for me in the docstring what "arguments" means and what "keyword-like arguments" means. In particular if we have a form lhs = rhs
what values are accepted on both sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be also good too mention that whole expressions can be escaped by $
(like in the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bkamins. Thanks for the review! I've updated the docs to clarify phrasing regarding the points you raised above. I'm happy to continue revising as you see fit. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you!
Thanks for your work on this! I think we need to re-think how it's done, however. Currently, the whole RHS of the expression is passed to the whole
This seems like it could be a source of bugs. That said, I like how we restrict the RHS to be a "valid" column identifier. I think it's good to be consistent about differences between columns and data, even if some users might find it annoying. Can you try writing a helper function instead of |
Sure! I added a helper function called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/parsing.jl
Outdated
@@ -387,6 +387,54 @@ fun_to_vec(ex::QuoteNode; | |||
gensym_names::Bool=false, | |||
outer_flags::Union{NamedTuple, Nothing}=nothing) = ex | |||
|
|||
|
|||
""" | |||
fun_to_pair(ex::Expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name of this function? it's not really fun_to_pair
, more like rename_kw_to_pair
or something.
src/parsing.jl
Outdated
|
||
ex_col = get_column_expr(ex) | ||
if ex_col !== nothing | ||
str_p = MacroTools.postwalk(x->x isa QuoteNode ? String(x.value) : x, ex_col ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very fragile. For example it will fail on
$(x => y)
where x
and y
are both QuoteNode
s.
I propose to just check if an expression is of the form a = b
and if it's not, then just return that expression unaltered. We will deal with any inconsistencies at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we do in fun_to_vec
with the lines
ex_col = get_column_expr(ex)
if ex_col !== nothing
return ex_col
end
src/parsing.jl
Outdated
|
||
lhs = let t = ex.args[1] | ||
if t isa Symbol | ||
t = QuoteNode(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't handle this case. Just throw an error. This is a new function and we don't want old behavior to live forever.
See what error is thrown if you omit this check. If it's an okay error, that's good. If the error is something really obscure and hard for users to understand, add an explicit error message.
src/parsing.jl
Outdated
s | ||
end | ||
|
||
rhs = MacroTools.unblock(ex.args[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the unblock here? If you want to do complicated things, you would still need to hide things behind $(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unblock allows for the use of begin ... end
in cases such as the following:
@rename df :newcol = begin
$("old_col" * "1")
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you need a $
in there though? Since we don't allow arbitrary expressions on the RHS that are not wrapped in $()
. So this seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I edited the above to include $
. If I remove the unblock as in the following:
rhs = ex.args[2]
rhs_col = get_column_expr(rhs)
if rhs_col === nothing
throw(ArgumentError("Invalid column identifier on RHS in DataFramesMeta.jl macro"))
end
# parsing.jl:424
then
@rename df :newcol = begin
$("old_col" * "1")
end
throws Invalid column identifier on RHS in DataFramesMeta.jl macro
. I'll look this over more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its okay to disallow the second case. People can do
$( begin
end)```
if they want to.
src/macros.jl
Outdated
x, exprs, outer_flags, kw = get_df_args_kwargs(x, args...; wrap_byrow = false) | ||
t = (fun_to_pair(ex) for ex in exprs) | ||
quote | ||
$DataFrames.rename!($x, $(t...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended from comments below. Instead of doing String(lhs) => String(rhs)
in the parsing stage, let's clean up the inputs all together.
I would allocate an intermediate array of renaming calls. then write a function that turns every vector of pairs into a vector of pairs of strings. Then pass that to rename
. Something like
julia> function pairs_to_str_pairs(args...)
map(args) do arg
if !(arg isa Pair)
throw(ArgumentError("Non-pair created in @rename"))
end
string(first(arg)) => string(last(arg))
end
end;
then do
$DataFrames.rename!($x, $pairs_to_str_pairs($(t...)...))
This means you don't have to do MacroTools.postwalk
in the parsing stage. See the function make_source_concrete
for inspiration. It's used when we have more than one input column, i.e. @rtransform df :x = :a + :b
Edge case: What happens when we want to rename the first column, i.e. @rename df :a = $1
. I want this to work, particularly because rename(df, 1 => :a)
works. But I don't want to mess around with names(df, 1)
in this stage. So its a tough balance. Fortunately we get to point fingers at DataFrames.jl for any limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've resolved your comments aside from this edge case. Could we handle this case with a modified version
of pairs_to_str_pairs
?
function pairs_to_str_pairs(args...)
map(args) do arg
if !(arg isa Pair)
throw(ArgumentError("Non-pair created in @rename"))
end
if first(arg) isa Int
return first(arg) => string(last(arg))
end
string(first(arg)) => string(last(arg))
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this seems good!
Remember, this is done at call-time, not at the parsing stage.
Thanks! Sorry, I thought you had pushed your changes. Please push them and I will take a look. |
df = DataFrame(old_col1 = rand(10), old_col2 = rand([missing;2:10],10),old_col3 = rand(10)); | ||
|
||
# rename test set | ||
@test @rename(df, :new1 = :old_col1) ≅ rename(df, :old_col1 => :new1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests with $"a string" = ...
i.e. $
on the LHS?
@MatthewRGonzalez Is this ready for a review? |
@pdeffebach yes, thanks! |
end | ||
|
||
if first(arg) isa Int | ||
return first(arg) => string(last(arg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work. One small thing. I think string
is too strong here. Here's the issue. :(AsTable(:x))
passes get_column_expr
because we allow AsTable
to be used as a column identifier identifier. inside transformations. So you end up calling string(:old_col) => string(AsTable(:x))
The problem is that string(AsTable(:x))
gets turned into "AsTable(x)"
.
The solution is two-fold
- Make a new function called
get_column_expr_rename
which only looks for:x
and$(...)
, noAsTable
- Add more error handling in the
pairs_to_str_pairs
function.
I've done both and added more tests. I've also re-written the docstring.
So I think this is good to merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @pdeffebach. Thanks for adding the style fixes, error handling, and adding get_column_expr_rename
as well!
This pull request resolves #204 by providing new functionalities to
DataFramesMeta.jl
New Macros
The following macros are added
@rename
and@rename!
These macros extend the functionality of
DataFrames.rename
andDataFrames.rename!
to theDataFramesMeta.jl
ecosystem. These macros also allow arbitrary RHS expressions (per #204 (comment)).Tests
rename.jl
runtests.jl
For convenience, I added a simple helper function that converts pair elements of the final argument transformation to strings. I'm happy to make any modifications. Thanks!