Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqrt_trans, scale limit expansion, and missing breaks #980

Closed
BrianDiggs opened this issue Jun 26, 2014 · 15 comments · Fixed by #4775
Closed

sqrt_trans, scale limit expansion, and missing breaks #980

BrianDiggs opened this issue Jun 26, 2014 · 15 comments · Fixed by #4775
Labels
feature a feature request or enhancement scales 🐍 tidy-dev-day 🤓 Tidyverse Developer Day

Comments

@BrianDiggs
Copy link
Contributor

Prompted by a posting on the mailing list (https://groups.google.com/d/topic/ggplot2/IUje5H0jwm4).

Summary

Specific problem: Breaks near 0 are not displayed when the square root transformation is applied to a scale.

General problem: Scale expansion in transformed coordinate space can lead to values which are not meaningfully (or correctly) invertable to data space leading to improperly excluded breaks.

Reproducible example:

library("ggplot2")
library("scales")

DF <- data.frame(x = seq(0,1,by=0.1),
                 y = seq(0,1,by=0.1))

ggplot(DF, aes(x=x, y=y)) + 
  geom_point() + 
  scale_x_sqrt() +
  scale_y_continuous()

Expected result

A plot with breaks labeled at 0, 0.25, 0.50, 0.75, and 1.00

Actual results

Actual results

Note that there is no 0 break on the x-axis.

Discussion

The error occurs because when the limits (in coordinate space) are expanded, there are negative values which, when transformed back to data space, give the incorrect limits from which breaks are determined (or at least limited). Stepping through the effective steps that occur for getting the breaks shows:

st <- sqrt_trans()
(x<-st$transform(c(0,1)))
## [1] 0 1
(x<-expand_range(x, 0.05, 0))
## [1] -0.05  1.05
(limits<-st$inverse(x))
## [1] 0.0025 1.1025
(breaks<-st$breaks(limits))
## [1] 0.00 0.25 0.50 0.75 1.00
st$trans(breaks)
## [1] 0.0000 0.5000 0.7071 0.8660 1.0000
st$trans(limits)
## [1] 0.05 1.05
censor(st$trans(breaks), st$trans(limits))
## [1]     NA 0.5000 0.7071 0.8660 1.0000

The real problem is that the result of the expand_range call lies outside the domain of the transformation. How should extra-domain values be treated?

Workarounds

Don't square negative values

One solution to this problem is an alternative transformation, one that does not invert negative values. A transformation should be one-to-one (within its domain) and sqrt_trans is, but it happily will run the inverse on negative values which can not occur if everything is constrained within the domain. A simple approach is to just map all negative values to 0

mysqrt_trans <- function() {
  trans_new("mysqrt", 
            transform = base::sqrt,
            inverse = function(x) ifelse(x<0, 0, x^2),
            domain = c(0, Inf))
}

Squish range before inverting

If we assume that all transformations are monotonic (I'm not sure if ggplot2/scales assume transformations are monotonic or just one-to-one; I can not come up with a useful transformation which is not, though I can create a pathological one.), then it is reasonable to squish any values outside the range (not domain) of the transformation. Bringing them back to the nearest extreme should be sufficient. Therefore a more general approach for an inverse would be

mysqrt_trans <- function() {
  domain <- c(0, Inf)
  transform <- base::sqrt
  range <- transform(domain)
  trans_new("mysqrt", 
            transform = transform,
            inverse = function(x) squish(x, range=range)^2,
            domain = domain)
}

Squish to range whenever values are extended

This approach makes it the responsibility of the code which manipulates transformed (coordinate space) values to squish those to the appropriate range if there is any chance that that range is violated. If monotonicity is assumed, I think any interpolations should be safe, but any operation which can result in a value more extreme than the existing most extreme values would need to be squished. If this approach is taken, it would be worth adding an additional component range to the trans which is just the result of transform(domain).

The transformation, then, could have its inverse just assume that the data is in the range or it can check that before proceeding (just as now transform may or may not check domain before proceeding). Ideally, the transformation should throw an error if either transform is called with values outside domain or inverse is called with values outside range and this would help pick out places where calling code is not behaving appropriately.

@SamGG
Copy link

SamGG commented Aug 7, 2014

I agree the need to have a fully labelled scale. Not sure I understand the workarounds nor any would display the origin, ie zero.
It would be great that ticks are equally spaced in the display; currently ticks seem to be linearly spaced in the original scale.

@hadley
Copy link
Member

hadley commented Jun 12, 2015

This seems like it's a problem that the scales package should be solving? Do you agree?

@BrianDiggs
Copy link
Contributor Author

It should partially be dealt with by scales, but I'm not sure it can completely be handled there. The square root transformation makes a good example as it is only defined for non-negative values. The original problem could be solved by the "Squish range before inverting" option above, which does take place entirely in scales. However, there is a side effect in doing that. The axis can then never go below zero. That includes any absolute padding that ggplot would want to add beyond the limits of the scale (there could never be space between the 0 on one axis and the other axis). I think most of the work should be in scales, but then some thinking about exactly how expand interacts with transformations, especially when taken beyond the domain/range, needs to be made. I have not thought all the way through that.

@hadley
Copy link
Member

hadley commented Jun 19, 2015

Another idea occurred to me - why not make the transformation sign(x) * sqrt(abs(x))? It seems like a useful generalisation, and is a simple solution to this problem (which I think will be tricky to solve generally)

@BrianDiggs
Copy link
Contributor Author

That would work for this particular case. Basically, taking a transformation that is $[0,Inf] \to [0,Inf]$ and replacing it with one that is $[-Inf,Inf] \to [-Inf,Inf]$ that is identical where they overlap and similar/"smooth" at 0. But it won't solve the problem for any other transformation with an edge (e.g. log, arcsin) that have no generalization across the boundary. This change also runs the risk of taking, returning, and plotting negative values where that is not expected.

@krlmlr
Copy link
Member

krlmlr commented Jul 28, 2015

Is this related to #1209?

@BrianDiggs
Copy link
Contributor Author

Quite likely. All the probability transformations have a range (the transform of the domain) that is [0,1] and quite likely have their inverse transformations ill defined beyond that (likely NaN, but I don't know that that is guaranteed). Some of the distribution functions say

Invalid arguments will result in return value NaN, with a warning.

In particular, qnorm (as used by probit) does not have this statement, but does behave this way.

> qnorm(c(0.99999999999999, 1, 1.000000000001))
[1] 7.650731      Inf      NaN
Warning message:
In qnorm(c(0.99999999999999, 1, 1.000000000001)) : NaNs produced
> qnorm(c(-0.0000000000001, 0, 0.000000000001))
[1]       NaN      -Inf -7.034484
Warning message:
In qnorm(c(-1e-13, 0, 1e-12)) : NaNs produced

@paleolimbot
Copy link
Member

I think squishing within trans$domain is reasonable for break calculations. This is a ggplot2-specific problem (unless scale expansion becomes a scales package issue...after #3380 it will at least be consolidated in one file).

library(scales)

trans_inverse_squish <- function(trans, trans_limits) {
  trans_domain <- trans$transform(trans$domain)
  c(
    squish(trans_limits[1], trans_domain), 
    squish(trans_limits[2], trans_domain)
  )
}

trans_inverse_squish(sqrt_trans(), c(0, 1))
#> [1] 0 1
trans_inverse_squish(sqrt_trans(), c(-1, 1))
#> [1] 0 1
trans_inverse_squish(sqrt_trans(), c(-2, -1))
#> [1] 0 0

trans_inverse_squish(identity_trans(), c(0, 1))
#> [1] 0 1
trans_inverse_squish(identity_trans(), c(-1, 1))
#> [1] -1  1
trans_inverse_squish(identity_trans(), c(-2, -1))
#> [1] -2 -1

Created on 2019-06-25 by the reprex package (v0.2.1)

@paleolimbot
Copy link
Member

I think that the body of trans_inverse_squish (above) could be inserted into ScaleContinuous$get_breaks() where it reassigns limits to be in data space:

ggplot2/R/scale-.r

Lines 494 to 500 in e2bdf85

get_breaks = function(self, limits = self$get_limits()) {
if (self$is_empty()) {
return(numeric())
}
# Limits in transformed space need to be converted back to data space
limits <- self$trans$inverse(limits)

@thomasp85
Copy link
Member

@paleolimbot has this (very old) issue been superseded by #3592?

It seems the root cause is always the expansion factor

@paleolimbot
Copy link
Member

Computing breaks on the non-expanded range would fix this specific issue (IIRC), but I don't think breaks should be computed on the unexpanded range (sort of empty looking if I'm remembering right).

If you think any of these are good PR material as you're going through them, feel free to flag me and I will work one up.

@thomasp85
Copy link
Member

I won't be going through all the old ones that already have tags... just wanted to take trip down memory lane... I think it would be nice to clear all these old issues out at some point though, so if you have the time... :-)

@paleolimbot
Copy link
Member

Cool...will wait for the patch release and will see what's the most useful way I could contribute after that.

@teunbrand
Copy link
Collaborator

teunbrand commented Oct 15, 2020

I played around with an implementation like @paleolimbot suggested, wherein the domain is transformed and then used to squish the limits in the ScaleContinuous$get_breaks() method.

This fixes the sqrt_trans() scale problem and works great for other numerical scales too. However, because date_trans() and time_trans() domains are numeric c(-Inf, Inf), they'll throw errors like the one below:

x <- scales::date_trans()
x$transform(x$domain)
#> Error: Invalid input: date_trans works with objects of class Date only

Created on 2020-10-15 by the reprex package (v0.3.0)

I've come across this error before in e.g. #4155, or in an extension zeehio/facetscales#22, or workarounds in ggplot's code:

ggplot2/R/scale-expansion.r

Lines 146 to 149 in 885c3c1

# using the inverse transform to resolve the NA value is needed for date/datetime/time
# scales, which refuse to transform objects of the incorrect type
coord_limits <- coord_limits %||% scale$trans$inverse(c(NA_real_, NA_real_))
coord_limits_scale <- scale$trans$transform(coord_limits)

This makes me think perhaps an issue should be filed in the scales package asking for more leniency on the date_trans()/time_trans() transformations.

@yutannihilation
Copy link
Member

This makes me think perhaps an issue should be filed in the scales package asking for more leniency on the date_trans()/time_trans() transformations.

I too feel this needs to be fixed on scales' side. Possible fixes are

  1. make domain a Date/POSIXct/hms class (it seems they all accept Inf, though they are not printable. c.f. https://twitter.com/yutannihilat_en/status/1316678502880428032)
  2. make transform() accept numeric values

Probably, domain should have the same class as that of the trans, so fix 1 seems straightforward. On the other hand, fix 2 let us use Inf like the code below.

set.seed(10)
df <- data.frame(
  date = Sys.Date() + 1:2,
  price = 1:2
)

ggplot(df, aes(date, price)) +
  geom_line() +
  # annotate("point", x = as.Date(Inf), y = 1.5, size = 30) works
  annotate("point", x = Inf, y = 1.5, size = 30)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement scales 🐍 tidy-dev-day 🤓 Tidyverse Developer Day
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants