Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rename macro, docs, and tests #343

Merged
merged 11 commits into from
Feb 6, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ export @with,
@subset, @subset!, @rsubset, @rsubset!,
@orderby, @rorderby,
@by, @combine,
@rename, @rename!,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@distinct, @rdistinct, @distinct!, @rdistinct!,
Expand Down
212 changes: 212 additions & 0 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2748,3 +2748,215 @@ julia> @rdistinct!(df, :x + :y)
macro rdistinct!(d, args...)
esc(rdistinct!_helper(d, args...))
end

##############################################################################
##
## @rename - rename columns with keyword args
##
##############################################################################
function rename_helper(x, args...)
x, exprs, outer_flags, kw = get_df_args_kwargs(x, args...; wrap_byrow = false)
t = (rename_kw_to_pair(ex) for ex in exprs)
quote
$DataFrames.rename($x, $pairs_to_str_pairs($(t...))...)
end
end

"""
@rename(d, args...)

Change column names.

### Arguments

* `d` : an AbstractDataFrame
* `args...` : expressions of the form `:new = :old` specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as
symbol arguments, as in `:old_col`, or strings escaped with `$DOLLAR` as in `$DOLLAR"new_col"`.
See **Details** for a description of accepted values.

### Returns

* `::AbstractDataFrame`

Inputs to `@rename` can come in two formats: a `begin ... end` block, or as a series of
keyword-like arguments. For example, the following are equivalent:

```julia
@rename df begin
:new_col = :old_col
end
```

and

```
@rename(df, :new_col = :old_col)
```

### Details

Both the left- and right-hand side of an expression specifying a column name assignment
can be either a `Symbol` or a `String`` escaped with `$DOLLAR` For example `:new = ...`,
and `$(DOLLAR)"new" = ...` are both valid ways of assigning a new column name.

This idea can be extended to pass arbitrary right-hand side expressions. For example,
the following are equivalent:

```
@rename(df, :new = :old1)
```

and

```
@rename(df, :new = $("old_col" * "1"))
```

### Examples
```
julia> df = DataFrame(old_col1 = rand(5), old_col2 = rand(5),old_col3 = rand(5));

julia> @rename(df, :new1 = :old_col1)
5×3 DataFrame
Row │ new1 old_col2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292

julia> @rename(df, :new1 = :old_col1, :new2 = $DOLLAR"old_col2")
5×3 DataFrame
Row │ new1 new2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292

julia> @rename(df, :new1 = $DOLLAR("old_col" * "1"), :new2 = :old_col2)
5×3 DataFrame
Row │ new1 new2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292
```
"""
macro rename(x, args...)
esc(rename_helper(x, args...))
end

function rename!_helper(x, args...)
x, exprs, outer_flags, kw = get_df_args_kwargs(x, args...; wrap_byrow = false)
t = (rename_kw_to_pair(ex) for ex in exprs)
quote
$DataFrames.rename!($x, $pairs_to_str_pairs($(t...))...)
end
end

"""
@rename!(d, args...)

In-place modification of column names.

### Arguments

* `d` : an AbstractDataFrame
* `args...` : expressions of the form `:new = :old` specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as
symbol arguments, as in `:old_col`, or strings escaped with `$DOLLAR` as in `$DOLLAR"new_col"`.
See **Details** for a description of accepted values.

### Returns

* `::AbstractDataFrame`

Inputs to `@rename!` can come in two formats: a `begin ... end` block, or as a series of
keyword-like arguments. For example, the following are equivalent:

```julia
@rename! df begin
:new_col = :old_col
end
```

and

```
@rename!(df, :new_col = :old_col)
```

### Details

Both the left- and right-hand side of an expression specifying a column name assignment
can be either a `Symbol` or a `String`` escaped with `$DOLLAR` For example `:new = ...`,
and `$(DOLLAR)"new" = ...` are both valid ways of assigning a new column name.

This idea can be extended to pass arbitrary right-hand side expressions. For example,
the following are equivalent:

```
@rename!(df, :new = :old1)
```

and

```
@rename!(df, :new = $("old_col" * "1"))
```

### Examples
```
julia> df = DataFrame(old_col1 = rand(5), old_col2 = rand(5),old_col3 = rand(5));

julia> @rename!(df, :new1 = :old_col1)
5×3 DataFrame
Row │ new1 old_col2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292

julia> df = DataFrame(old_col1 = rand(5), old_col2 = rand(5),old_col3 = rand(5));

julia> @rename!(df, :new1 = :old_col1, :new2 = $DOLLAR"old_col2")
5×3 DataFrame
Row │ new1 new2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292

julia> df = DataFrame(old_col1 = rand(5), old_col2 = rand(5),old_col3 = rand(5));

julia> @rename!(df, :new1 = $DOLLAR("old_col" * "1"), :new2 = :old_col2)
5×3 DataFrame
Row │ new1 new2 old_col3
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.0176206 0.493592 0.348072
2 │ 0.861545 0.512254 0.85763
3 │ 0.263082 0.0267507 0.696494
4 │ 0.643179 0.299391 0.780125
5 │ 0.731267 0.18905 0.767292
```
"""
macro rename!(x, args...)
esc(rename!_helper(x, args...))
end

56 changes: 56 additions & 0 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,62 @@ fun_to_vec(ex::QuoteNode;
gensym_names::Bool=false,
outer_flags::Union{NamedTuple, Nothing}=nothing) = ex


"""
rename_kw_to_pair(ex::Expr)

Given an expression where the left- and right- hand side
both are both valid column identifiers, i.e., a `QuoteNode`
or an expression beginning with `$DOLLAR`, or a "full" expression of the form
`$DOLLAR(:x => :y)`, return an expression, where expression arguments of type
`QuoteNode`` are converted to `String``.
"""
function rename_kw_to_pair(ex::Expr)

ex_col = get_column_expr(ex)
if ex_col !== nothing
return ex_col
end

lhs = let t = ex.args[1]

s = get_column_expr(t)
if s === nothing
throw(ArgumentError("Invalid column identifier on LHS in DataFramesMeta.jl macro"))
end

s
end

rhs = MacroTools.unblock(ex.args[2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the unblock here? If you want to do complicated things, you would still need to hide things behind $(...).

Copy link
Contributor Author

@MatthewRGonzalez MatthewRGonzalez Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unblock allows for the use of begin ... end in cases such as the following:

@rename df :newcol = begin
    $("old_col" * "1")
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you need a $ in there though? Since we don't allow arbitrary expressions on the RHS that are not wrapped in $(). So this seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I edited the above to include $. If I remove the unblock as in the following:

rhs = ex.args[2]
rhs_col = get_column_expr(rhs)
if rhs_col === nothing
    throw(ArgumentError("Invalid column identifier on RHS in DataFramesMeta.jl macro"))
end

# parsing.jl:424

then

@rename df :newcol = begin
    $("old_col" * "1")
end

throws Invalid column identifier on RHS in DataFramesMeta.jl macro. I'll look this over more.

Copy link
Collaborator

@pdeffebach pdeffebach Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its okay to disallow the second case. People can do

$( begin 


end)```

if they want to. 

rhs_col = get_column_expr(rhs)

if rhs_col === nothing
throw(ArgumentError("Invalid column identifier on RHS in DataFramesMeta.jl macro"))
end

if rhs_col !== nothing
src = rhs_col
dest = lhs
return :($src => $dest)
end

end

function pairs_to_str_pairs(args...)

map(args) do arg
if !(arg isa Pair)
throw(ArgumentError("Non-pair created in @rename"))
end

if first(arg) isa Int
return first(arg) => string(last(arg))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthewRGonzalez

Thank you for your work. One small thing. I think string is too strong here. Here's the issue. :(AsTable(:x)) passes get_column_expr because we allow AsTable to be used as a column identifier identifier. inside transformations. So you end up calling string(:old_col) => string(AsTable(:x))

The problem is that string(AsTable(:x)) gets turned into "AsTable(x)".

The solution is two-fold

  1. Make a new function called get_column_expr_rename which only looks for :x and $(...), no AsTable
  2. Add more error handling in the pairs_to_str_pairs function.

I've done both and added more tests. I've also re-written the docstring.

So I think this is good to merge!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @pdeffebach. Thanks for adding the style fixes, error handling, and adding get_column_expr_rename as well!

end
string(first(arg)) => string(last(arg))
end
end

function make_source_concrete(x::AbstractVector)
if length(x) == 1 && x[1] isa AsTable
return x[1]
Expand Down
Loading