Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rename macro, docs, and tests #343

Merged
merged 11 commits into from
Feb 6, 2023

Conversation

MatthewRGonzalez
Copy link
Contributor

@MatthewRGonzalez MatthewRGonzalez commented Nov 3, 2022

This pull request resolves #204 by providing new functionalities to DataFramesMeta.jl

New Macros

The following macros are added

  • @rename and @rename!

These macros extend the functionality of DataFrames.rename and DataFrames.rename! to the DataFramesMeta.jl ecosystem. These macros also allow arbitrary RHS expressions (per #204 (comment)).

Tests

  • General tests are added in rename.jl
  • Tests pass locally in runtests.jl

For convenience, I added a simple helper function that converts pair elements of the final argument transformation to strings. I'm happy to make any modifications. Thanks!

src/macros.jl Outdated
* `::AbstractDataFrame`

Inputs to `@rename` can come in two formats: a `begin ... end` block, or as a series of
arguments and keyword-like arguments. For example, the following are equivalent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear for me in the docstring what "arguments" means and what "keyword-like arguments" means. In particular if we have a form lhs = rhs what values are accepted on both sides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be also good too mention that whole expressions can be escaped by $ (like in the tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bkamins. Thanks for the review! I've updated the docs to clarify phrasing regarding the points you raised above. I'm happy to continue revising as you see fit. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you!

@pdeffebach
Copy link
Collaborator

Thanks for your work on this!

I think we need to re-think how it's done, however.

Currently, the whole RHS of the expression is passed to the whole fun_to_vec machinery. That is, for complicated expressions, we make anonymous functions and pass those to the renaming function.

julia> MacroTools.@macroexpand(@rename df :new = :old_col1 * :old_col3) |> MacroTools.prettify
:(make_renamed(df, DataFramesMeta.make_source_concrete([:old_col1, :old_col3]) => ((*) => :new)))

julia> @rename df :new = :old_col1 * :old_col3
ERROR: ArgumentError: Tried renaming :[:old_col1, :old_col3] to :(*) => :new, when :[:old_col1, :old_col3] does not exist in the data frame.

This seems like it could be a source of bugs.

That said, I like how we restrict the RHS to be a "valid" column identifier. I think it's good to be consistent about differences between columns and data, even if some users might find it annoying.

Can you try writing a helper function instead of fun_to_vec which just checks that the LHS and the RHS are both column identifiers and then makes a pair?

@MatthewRGonzalez
Copy link
Contributor Author

MatthewRGonzalez commented Nov 9, 2022

Can you try writing a helper function instead of fun_to_vec which just checks that the LHS and the RHS are both column identifiers and then makes a pair?

Sure! I added a helper function called fun_to_pair in parsing.jl. This function is a simpler version of fun_to_vec that relies on get_column_expr() to determine the validity of column identifiers. Please let me know if this is in line with you're expectations. Thanks!

Copy link
Collaborator

@pdeffebach pdeffebach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/parsing.jl Outdated
@@ -387,6 +387,54 @@ fun_to_vec(ex::QuoteNode;
gensym_names::Bool=false,
outer_flags::Union{NamedTuple, Nothing}=nothing) = ex


"""
fun_to_pair(ex::Expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the name of this function? it's not really fun_to_pair, more like rename_kw_to_pair or something.

src/parsing.jl Outdated

ex_col = get_column_expr(ex)
if ex_col !== nothing
str_p = MacroTools.postwalk(x->x isa QuoteNode ? String(x.value) : x, ex_col )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very fragile. For example it will fail on

$(x => y)

where x and y are both QuoteNodes.

I propose to just check if an expression is of the form a = b and if it's not, then just return that expression unaltered. We will deal with any inconsistencies at runtime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we do in fun_to_vec with the lines

    ex_col = get_column_expr(ex)
    if ex_col !== nothing
        return ex_col
    end

src/parsing.jl Outdated

lhs = let t = ex.args[1]
if t isa Symbol
t = QuoteNode(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't handle this case. Just throw an error. This is a new function and we don't want old behavior to live forever.

See what error is thrown if you omit this check. If it's an okay error, that's good. If the error is something really obscure and hard for users to understand, add an explicit error message.

src/parsing.jl Outdated
s
end

rhs = MacroTools.unblock(ex.args[2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the unblock here? If you want to do complicated things, you would still need to hide things behind $(...).

Copy link
Contributor Author

@MatthewRGonzalez MatthewRGonzalez Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unblock allows for the use of begin ... end in cases such as the following:

@rename df :newcol = begin
    $("old_col" * "1")
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you need a $ in there though? Since we don't allow arbitrary expressions on the RHS that are not wrapped in $(). So this seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I edited the above to include $. If I remove the unblock as in the following:

rhs = ex.args[2]
rhs_col = get_column_expr(rhs)
if rhs_col === nothing
    throw(ArgumentError("Invalid column identifier on RHS in DataFramesMeta.jl macro"))
end

# parsing.jl:424

then

@rename df :newcol = begin
    $("old_col" * "1")
end

throws Invalid column identifier on RHS in DataFramesMeta.jl macro. I'll look this over more.

Copy link
Collaborator

@pdeffebach pdeffebach Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its okay to disallow the second case. People can do

$( begin 


end)```

if they want to. 

src/macros.jl Outdated
x, exprs, outer_flags, kw = get_df_args_kwargs(x, args...; wrap_byrow = false)
t = (fun_to_pair(ex) for ex in exprs)
quote
$DataFrames.rename!($x, $(t...))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended from comments below. Instead of doing String(lhs) => String(rhs) in the parsing stage, let's clean up the inputs all together.

I would allocate an intermediate array of renaming calls. then write a function that turns every vector of pairs into a vector of pairs of strings. Then pass that to rename. Something like

julia> function pairs_to_str_pairs(args...)
           map(args) do arg
               if !(arg isa Pair)
                   throw(ArgumentError("Non-pair created in @rename"))
               end
                   
               string(first(arg)) => string(last(arg))
           end
       end;

then do

        $DataFrames.rename!($x, $pairs_to_str_pairs($(t...)...))

This means you don't have to do MacroTools.postwalk in the parsing stage. See the function make_source_concrete for inspiration. It's used when we have more than one input column, i.e. @rtransform df :x = :a + :b

Edge case: What happens when we want to rename the first column, i.e. @rename df :a = $1. I want this to work, particularly because rename(df, 1 => :a) works. But I don't want to mess around with names(df, 1) in this stage. So its a tough balance. Fortunately we get to point fingers at DataFrames.jl for any limitations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've resolved your comments aside from this edge case. Could we handle this case with a modified version
of pairs_to_str_pairs?

function pairs_to_str_pairs(args...)
           map(args) do arg
               if !(arg isa Pair)
                   throw(ArgumentError("Non-pair created in @rename"))
               end              
               if first(arg) isa Int            
                   return first(arg) => string(last(arg))
               end      
               string(first(arg)) => string(last(arg))
           end
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this seems good!

Remember, this is done at call-time, not at the parsing stage.

@pdeffebach
Copy link
Collaborator

Thanks!

Sorry, I thought you had pushed your changes. Please push them and I will take a look.

df = DataFrame(old_col1 = rand(10), old_col2 = rand([missing;2:10],10),old_col3 = rand(10));

# rename test set
@test @rename(df, :new1 = :old_col1) ≅ rename(df, :old_col1 => :new1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests with $"a string" = ... i.e. $ on the LHS?

@pdeffebach
Copy link
Collaborator

@MatthewRGonzalez Is this ready for a review?

@MatthewRGonzalez
Copy link
Contributor Author

@pdeffebach yes, thanks!

end

if first(arg) isa Int
return first(arg) => string(last(arg))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthewRGonzalez

Thank you for your work. One small thing. I think string is too strong here. Here's the issue. :(AsTable(:x)) passes get_column_expr because we allow AsTable to be used as a column identifier identifier. inside transformations. So you end up calling string(:old_col) => string(AsTable(:x))

The problem is that string(AsTable(:x)) gets turned into "AsTable(x)".

The solution is two-fold

  1. Make a new function called get_column_expr_rename which only looks for :x and $(...), no AsTable
  2. Add more error handling in the pairs_to_str_pairs function.

I've done both and added more tests. I've also re-written the docstring.

So I think this is good to merge!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @pdeffebach. Thanks for adding the style fixes, error handling, and adding get_column_expr_rename as well!

@pdeffebach pdeffebach merged commit 3c5ba5f into JuliaData:master Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the @rename macro
3 participants