Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally order columns of multiple value.var in dcast() by RHS of formula #2601

Open
UweBlock opened this issue Feb 1, 2018 · 1 comment
Labels
reshape dcast melt

Comments

@UweBlock
Copy link
Contributor

UweBlock commented Feb 1, 2018

When reshaping multiple value.vars from long to wide format with dcast(), the columns are ordered such that the value.vars are grouped together, e.g.,

# sample data
library(data.table)
long <- structure(list(id = c(1L, 1L, 2L, 2L), year = structure(c(1L, 
2L, 2L, 3L), class = "factor", .Label = c("2007", "2008", "2009"
)), X1 = c(12007.1, 12008.1, 22008.1, 22009.1), X2 = c("12007_2", 
"12008_2", "22008_2", "22009_2")), .Names = c("id", "year", "X1", 
"X2"), class = c("data.table", "data.frame"), row.names = c(NA, -4L))
long
   id year      X1      X2
1:  1 2007 12007.1 12007_2
2:  1 2008 12008.1 12008_2
3:  2 2008 22008.1 22008_2
4:  2 2009 22009.1 22009_2
# reshape from to long to wide format
cols <- c("X1", "X2")
dcast(long, id ~ year, value.var = cols)
   id X1_2007 X1_2008 X1_2009 X2_2007 X2_2008 X2_2009
1:  1 12007.1 12008.1      NA 12007_2 12008_2      NA
2:  2      NA 22008.1 22009.1      NA 22008_2 22009_2

There are some questions on SO which expect the order of columns to be grouped by the RHS:
Reshape data within groups - groups in a single row
How to reshape tabular data to one row per group

There is no option in dcast() yet which allows to specify the order of reshaped columns.

Workaround

As a workaround, the column order can be rearranged using setcolorder() but this requires a lot of manual coding:

# reorder columns to group by RHS
wide <- dcast(long, id ~ year, value.var = cols)
new_col_order <- CJ(unique(long$year), cols)[, paste(V2, V1, sep = "_")]
setcolorder(wide, c(setdiff(names(wide), new_col_order), new_col_order))
wide
   id X1_2007 X2_2007 X1_2008 X2_2008 X1_2009 X2_2009
1:  1 12007.1 12007_2 12008.1 12008_2      NA      NA
2:  2      NA      NA 22008.1 22008_2 22009.1 22009_2

Now, the columns are order by the RHS (year). However, every change to the formula requires to amend the code to reorder the columns:

wide <- dcast(long, id ~ rowid(id), value.var = cols)
new_col_order <- CJ(seq_len(uniqueN(long$id)), cols)[, paste(V2, V1, sep = "_")]
setcolorder(wide, c(setdiff(names(wide), new_col_order), new_col_order))
wide
   id    X1_1    X2_1    X1_2    X2_2
1:  1 12007.1 12007_2 12008.1 12008_2
2:  2 22008.1 22008_2 22009.1 22009_2

Also, with a list of functions:

wide <- dcast(long, id ~ year, fun.aggregate = list(mean, sd), value.var = "X1")
new_col_order <- CJ(unique(long$year), c("mean", "sd"))[, paste("X1", V2, V1, sep = "_")]
setcolorder(wide, c(setdiff(names(wide), new_col_order), new_col_order))
wide
   id X1_mean_2007 X1_sd_2007 X1_mean_2008 X1_sd_2008 X1_mean_2009 X1_sd_2009
1:  1      12007.1         NA      12008.1         NA          NaN         NA
2:  2          NaN         NA      22008.1         NA      22009.1         NA

Related Issues

There are related issues which deal with naming of output columns in first place but not with order
#1153
#1951
(Note, I am happy with the current naming convention)

Output of sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
library(data.table)
data.table 1.10.5 IN DEVELOPMENT built 2018-01-23 05:02:02 UTC; appveyor
@jangorecki jangorecki added the reshape dcast melt label Sep 21, 2019
@jangorecki jangorecki changed the title [Request] Optionally order columns of multiple value.var in dcast() grouped by RHS Optionally order columns of multiple value.var in dcast() grouped by RHS Sep 21, 2019
@UweBlock UweBlock changed the title Optionally order columns of multiple value.var in dcast() grouped by RHS Optionally order columns of multiple value.var in dcast() grouped by RHS of formula Oct 9, 2021
@UweBlock UweBlock changed the title Optionally order columns of multiple value.var in dcast() grouped by RHS of formula Optionally order columns of multiple value.var in dcast() by RHS of formula Oct 9, 2021
@UweBlock
Copy link
Contributor Author

UweBlock commented Oct 9, 2021

There is a recent question on SO Automatically order a wide data.table: dcast columns in a specific order/setcolorder based on a pattern with numbers.

This brings me the idea to use patterns in addition to colorder = "value.var" (the default) or colorder = "rhs". So, we might have something like
colorder = pattern(..., regex)
where regex is a regular expression with groups (in parantheses) and ... giving the order to sort the columns.

Perhaps, this feature might better become an enhancment of setcolorder()?

Giving a second thought, something like
colorder = forderv(c("year", "X2", "X1")) or colorder = forderv(c("year", "value.var"))
could work as well, e.g.,

cols <- c("X1", "X2")
wide <- dcast(long, id ~ year, value.var = cols, colorder = forderv(c("year", cols))  

would return

   id X1_2007 X2_2007 X1_2008 X2_2008 X1_2009 X2_2009
1:  1 12007.1 12007_2 12008.1 12008_2      NA      NA
2:  2      NA      NA 22008.1 22008_2 22009.1 22009_2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

No branches or pull requests

2 participants