-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn about use of $ in aes() #2693
Comments
I think what needs to be caught are vectors of constants in Here is a slightly modified reprex from the off-topic discussion in #2660, to highlight the issues: library(ggplot2)
df <- data.frame(x = 1:4,
y = 1:4,
label = c("a", "b", "a", "b"),
colour = c("magenta", "green", "magenta", "green"))
# using a data column inside aes causes problems, the colors don't match within facets
ggplot(df, aes(x, y, colour = df$colour)) +
geom_point(size = 3) +
facet_wrap(~label) +
scale_colour_identity() # directly handing the colors to `geom_point()` as a parameter is also wrong
ggplot(df, aes(x, y)) +
geom_point(size = 3, colour = df$colour) +
facet_wrap(~label) # this is valid code, however, if a bit weird
df2 <- data.frame(point = "point", label = "label")
ggplot(df, aes(x, y)) +
geom_point(aes(colour = df2$point), size = 3) +
geom_text(aes(x = x + 0.2, label = label, colour = df2$label)) Created on 2018-06-09 by the reprex package (v0.2.0). |
On StackOverflow I used to refer to this example |
Possible implementations:
Neither of those techniques would catch this related problem: df <- data.frame(x = 1:3)
y <- 3:1
ggplot(df, aes(x, y)) + geom_point() Or code that's handy, but potentially problematic like I wonder if we should rethink how the facetting code works to ensure that we don't mess up vectors that don't come from the data? It might be possible to fix by evaluating the aesthetics in a different way (although that might Then identifying |
The problem arises with the reordering of the data frame on this line: Line 202 in 5e9b688
It's not immediately clear to me that the reordering does anything other than make the created data frame look nice. When I replace that line with just library(ggplot2) # patched line 202 in facet-wrap.r
df <- data.frame(x = 1:4,
y = 1:4,
label = c("a", "b", "a", "b"),
colour = c("magenta", "green", "magenta", "green"))
# works
ggplot(df, aes(x, y, colour = df$colour)) +
geom_point(size = 3) +
facet_wrap(~label) +
scale_colour_identity() # works
ggplot(df, aes(x, y)) +
geom_point(size = 3, colour = df$colour) +
facet_wrap(~label) +
scale_colour_identity() Created on 2018-06-10 by the reprex package (v0.2.0). |
FWIW I think it is a simple design mistake (can't remember if I introduced it during the rewrite, but it doesn't matter anyway). |
It's been there a long time, at least since 687ba6e. Let's remove in the next version, and then think about how to make a warning message steering people away from |
I just made a pull request, so the required changes are recorded. I didn't encounter any problems with |
If this is still something worth fixing, I think there are some possibilities. First, I think it is possible to inspect the AST and look for calls to library(rlang)
aes <- ggplot2::aes
is_dollar_sign_call <- function(call) {
is.call(call) && identical(call[[1]], quote(`$`))
}
is_double_bracket_call <- function(call) {
is.call(call) && identical(call[[1]], quote(`[[`))
}
is_extract_call <- function(call) {
is_dollar_sign_call(call) || is_double_bracket_call(call)
}
check_extract_usage_expr <- function(x, data, env = emptyenv()) {
if (is_extract_call(x)) {
data_eval <- eval_tidy(x[[2]], data, env)
if(identical(tracemem(data_eval), tracemem(data))) {
bad_call <- x
good_call <- x
good_call[[2]] <- quote(.data)
warning(
"Use of `", format(bad_call), "` is discouraged. ",
"Use `", format(good_call), "` instead.",
call. = FALSE
)
}
} else if (is.call(x)) {
lapply(x, check_extract_usage_expr, data, env)
} else if (is.pairlist(x)) {
lapply(x, check_extract_usage_expr, data, env)
}
invisible()
}
check_extract_usage_quo <- function(quosure, data) {
check_extract_usage_expr(get_expr(quosure), data, get_env(quosure))
}
check_extract_usage <- function(mapping, data) {
lapply(mapping, check_extract_usage_quo, data)
invisible(mapping)
}
# data
returns_x <- function() "x"
df <- tibble::tibble(x = 1:5, nested_df = tibble::tibble(x = 6:10))
# good
check_extract_usage(aes(x), df)
check_extract_usage(aes(.data$x), df)
check_extract_usage(aes(.data[["x"]]), df)
check_extract_usage(aes(.data[[!!quo("x")]]), df)
check_extract_usage(aes(.data[[returns_x()]]), df)
check_extract_usage(aes(!!sym("x")), df)
check_extract_usage(aes(x * 10), df)
check_extract_usage(aes(nested_df$x), df)
check_extract_usage(aes(nested_df[["x"]]), df)
check_extract_usage(aes(.data[[c("nested_df", "x")]]), df)
check_extract_usage(aes(.data[[c(2, 1)]]), df)
check_extract_usage(aes(.data[[1]]), df)
# bad: use of extraction
check_extract_usage(aes(df$x), df)
#> Warning: Use of `df$x` is discouraged. Use `.data$x` instead.
check_extract_usage(aes(df[["x"]]), df)
#> Warning: Use of `df[["x"]]` is discouraged. Use `.data[["x"]]` instead. Created on 2019-05-28 by the reprex package (v0.2.1) Similarly, it is possible to look for places where no columns from library(rlang)
aes <- ggplot2::aes
is_dollar_sign_call <- function(call) {
is.call(call) && identical(call[[1]], quote(`$`))
}
is_double_bracket_call <- function(call) {
is.call(call) && identical(call[[1]], quote(`[[`))
}
is_extract_call <- function(call) {
is_dollar_sign_call(call) || is_double_bracket_call(call)
}
is_pronoun_operation <- function(call) {
is.call(call) && identical(call[[2]], quote(.data))
}
index_value_to_name <- function(index, data) {
if (is.character(index)) {
index[1]
} else if (is.numeric(index)) {
names(data)[index[1]]
} else {
NULL
}
}
find_data_references <- function(x, data, env = emptyenv()) {
if (is.name(x) && (as.character(x) %in% names(data))) {
as.character(x)
} else if (is_double_bracket_call(x) && is_pronoun_operation(x)) {
index_value_to_name(eval_tidy(x[[3]], data, env), data)
} else if (is_dollar_sign_call(x) && is_pronoun_operation(x)) {
as.character(x[[3]])
} else if(is_dollar_sign_call(x)) {
find_data_references(x[[2]], data, env)
} else if (is.call(x)) {
new_names <- lapply(x, find_data_references, data, env)
unlist(new_names)
} else if (is.pairlist(x)) {
new_names <- lapply(x, find_data_references, data, env)
unlist(new_names)
}
}
find_data_references_quo <- function(quosure, data) {
find_data_references(get_expr(quosure), data, get_env(quosure))
}
find_mapped_cols <- function(mapping, data) {
intersect(unlist(lapply(mapping, find_data_references_quo, data)), names(data))
}
check_mapping <- function(mapping, data) {
data_name <- as_label(enquo(data))
cols <- find_mapped_cols(mapping, data)
if (length(cols) == 0) {
stop("At least one column in `", data_name, "` must be mapped", call. = FALSE)
}
invisible(mapping)
}
# data
returns_x <- function() "x"
df <- tibble::tibble(x = 1:5, nested_df = tibble::tibble(x = 6:10))
# good
check_mapping(aes(x), df)
check_mapping(aes(.data$x), df)
check_mapping(aes(.data[["x"]]), df)
check_mapping(aes(.data[[!!quo("x")]]), df)
check_mapping(aes(.data[[returns_x()]]), df)
check_mapping(aes(!!sym("x")), df)
check_mapping(aes(x * 10), df)
check_mapping(aes(nested_df$x), df)
check_mapping(aes(nested_df[["x"]]), df)
check_mapping(aes(.data[[c("nested_df", "x")]]), df)
check_mapping(aes(.data[[c(2, 1)]]), df)
check_mapping(aes(.data[[1]]), df)
# bad: use of extraction
check_mapping(aes(df$x), df)
#> Error: At least one column in `df` must be mapped
check_mapping(aes(df[["x"]]), df)
#> Error: At least one column in `df` must be mapped
# bad: no columns were mapped
check_mapping(aes(not_a_column_in_df), df)
#> Error: At least one column in `df` must be mapped
check_mapping(aes(not_a_column_in_df$x), df)
#> Error: At least one column in `df` must be mapped
check_mapping(aes(), df)
#> Error: At least one column in `df` must be mapped
# error during evaluation
check_mapping(aes(.data[[not_a_function()]]), df)
#> Error in not_a_function(): could not find function "not_a_function" Created on 2019-05-28 by the reprex package (v0.2.1) |
I think that overall approach has legs, so I think it's worth turning into a PR so I can give it full review. A few quick thoughts before you do that:
|
Yes. It's important that examples such as the following continue to work. Not sure whether this would work with the current implementation of library(ggplot2)
ggplot(mtcars, aes(disp, mpg)) +
geom_point(aes(color = "data")) +
geom_smooth(aes(color = "loess smooth"), se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x' Created on 2019-05-30 by the reprex package (v0.2.1) |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Quite a few people seem to do this, and it introduces potentially subtle bad behaviour when you start facetting. I think we now have sufficient understanding to detect this correctly, although we’ll need to carefully consider how to override
$
so ok usage is not caught.The text was updated successfully, but these errors were encountered: