sqrt_trans, scale limit expansion, and missing breaks #980

BrianDiggs · 2014-06-26T18:22:03Z

Prompted by a posting on the mailing list (https://groups.google.com/d/topic/ggplot2/IUje5H0jwm4).

Summary

Specific problem: Breaks near 0 are not displayed when the square root transformation is applied to a scale.

General problem: Scale expansion in transformed coordinate space can lead to values which are not meaningfully (or correctly) invertable to data space leading to improperly excluded breaks.

Reproducible example:

library("ggplot2")
library("scales")

DF <- data.frame(x = seq(0,1,by=0.1),
                 y = seq(0,1,by=0.1))

ggplot(DF, aes(x=x, y=y)) + 
  geom_point() + 
  scale_x_sqrt() +
  scale_y_continuous()

Expected result

A plot with breaks labeled at 0, 0.25, 0.50, 0.75, and 1.00

Actual results

Note that there is no 0 break on the x-axis.

Discussion

The error occurs because when the limits (in coordinate space) are expanded, there are negative values which, when transformed back to data space, give the incorrect limits from which breaks are determined (or at least limited). Stepping through the effective steps that occur for getting the breaks shows:

st <- sqrt_trans()
(x<-st$transform(c(0,1)))
## [1] 0 1
(x<-expand_range(x, 0.05, 0))
## [1] -0.05  1.05
(limits<-st$inverse(x))
## [1] 0.0025 1.1025
(breaks<-st$breaks(limits))
## [1] 0.00 0.25 0.50 0.75 1.00
st$trans(breaks)
## [1] 0.0000 0.5000 0.7071 0.8660 1.0000
st$trans(limits)
## [1] 0.05 1.05
censor(st$trans(breaks), st$trans(limits))
## [1]     NA 0.5000 0.7071 0.8660 1.0000

The real problem is that the result of the expand_range call lies outside the domain of the transformation. How should extra-domain values be treated?

Workarounds

Don't square negative values

One solution to this problem is an alternative transformation, one that does not invert negative values. A transformation should be one-to-one (within its domain) and sqrt_trans is, but it happily will run the inverse on negative values which can not occur if everything is constrained within the domain. A simple approach is to just map all negative values to 0

mysqrt_trans <- function() {
  trans_new("mysqrt", 
            transform = base::sqrt,
            inverse = function(x) ifelse(x<0, 0, x^2),
            domain = c(0, Inf))
}

Squish range before inverting

If we assume that all transformations are monotonic (I'm not sure if ggplot2/scales assume transformations are monotonic or just one-to-one; I can not come up with a useful transformation which is not, though I can create a pathological one.), then it is reasonable to squish any values outside the range (not domain) of the transformation. Bringing them back to the nearest extreme should be sufficient. Therefore a more general approach for an inverse would be

mysqrt_trans <- function() {
  domain <- c(0, Inf)
  transform <- base::sqrt
  range <- transform(domain)
  trans_new("mysqrt", 
            transform = transform,
            inverse = function(x) squish(x, range=range)^2,
            domain = domain)
}

Squish to range whenever values are extended

This approach makes it the responsibility of the code which manipulates transformed (coordinate space) values to squish those to the appropriate range if there is any chance that that range is violated. If monotonicity is assumed, I think any interpolations should be safe, but any operation which can result in a value more extreme than the existing most extreme values would need to be squished. If this approach is taken, it would be worth adding an additional component range to the trans which is just the result of transform(domain).

The transformation, then, could have its inverse just assume that the data is in the range or it can check that before proceeding (just as now transform may or may not check domain before proceeding). Ideally, the transformation should throw an error if either transform is called with values outside domain or inverse is called with values outside range and this would help pick out places where calling code is not behaving appropriately.

The text was updated successfully, but these errors were encountered:

SamGG · 2014-08-07T13:09:05Z

I agree the need to have a fully labelled scale. Not sure I understand the workarounds nor any would display the origin, ie zero.
It would be great that ticks are equally spaced in the display; currently ticks seem to be linearly spaced in the original scale.

hadley · 2015-06-12T15:05:58Z

This seems like it's a problem that the scales package should be solving? Do you agree?

BrianDiggs · 2015-06-18T18:21:48Z

It should partially be dealt with by scales, but I'm not sure it can completely be handled there. The square root transformation makes a good example as it is only defined for non-negative values. The original problem could be solved by the "Squish range before inverting" option above, which does take place entirely in scales. However, there is a side effect in doing that. The axis can then never go below zero. That includes any absolute padding that ggplot would want to add beyond the limits of the scale (there could never be space between the 0 on one axis and the other axis). I think most of the work should be in scales, but then some thinking about exactly how expand interacts with transformations, especially when taken beyond the domain/range, needs to be made. I have not thought all the way through that.

hadley · 2015-06-19T12:12:25Z

Another idea occurred to me - why not make the transformation sign(x) * sqrt(abs(x))? It seems like a useful generalisation, and is a simple solution to this problem (which I think will be tricky to solve generally)

BrianDiggs · 2015-06-19T15:42:22Z

That would work for this particular case. Basically, taking a transformation that is $[0,Inf] \to [0,Inf]$ and replacing it with one that is $[-Inf,Inf] \to [-Inf,Inf]$ that is identical where they overlap and similar/"smooth" at 0. But it won't solve the problem for any other transformation with an edge (e.g. log, arcsin) that have no generalization across the boundary. This change also runs the risk of taking, returning, and plotting negative values where that is not expected.

krlmlr · 2015-07-28T19:31:40Z

Is this related to #1209?

BrianDiggs · 2015-07-29T03:15:50Z

Quite likely. All the probability transformations have a range (the transform of the domain) that is [0,1] and quite likely have their inverse transformations ill defined beyond that (likely NaN, but I don't know that that is guaranteed). Some of the distribution functions say

Invalid arguments will result in return value NaN, with a warning.

In particular, qnorm (as used by probit) does not have this statement, but does behave this way.

> qnorm(c(0.99999999999999, 1, 1.000000000001))
[1] 7.650731      Inf      NaN
Warning message:
In qnorm(c(0.99999999999999, 1, 1.000000000001)) : NaNs produced
> qnorm(c(-0.0000000000001, 0, 0.000000000001))
[1]       NaN      -Inf -7.034484
Warning message:
In qnorm(c(-1e-13, 0, 1e-12)) : NaNs produced

paleolimbot · 2019-06-25T13:53:52Z

I think squishing within trans$domain is reasonable for break calculations. This is a ggplot2-specific problem (unless scale expansion becomes a scales package issue...after #3380 it will at least be consolidated in one file).

library(scales)

trans_inverse_squish <- function(trans, trans_limits) {
  trans_domain <- trans$transform(trans$domain)
  c(
    squish(trans_limits[1], trans_domain), 
    squish(trans_limits[2], trans_domain)
  )
}

trans_inverse_squish(sqrt_trans(), c(0, 1))
#> [1] 0 1
trans_inverse_squish(sqrt_trans(), c(-1, 1))
#> [1] 0 1
trans_inverse_squish(sqrt_trans(), c(-2, -1))
#> [1] 0 0

trans_inverse_squish(identity_trans(), c(0, 1))
#> [1] 0 1
trans_inverse_squish(identity_trans(), c(-1, 1))
#> [1] -1  1
trans_inverse_squish(identity_trans(), c(-2, -1))
#> [1] -2 -1

^{Created on 2019-06-25 by the reprex package (v0.2.1)}

paleolimbot · 2019-06-25T14:08:50Z

I think that the body of trans_inverse_squish (above) could be inserted into ScaleContinuous$get_breaks() where it reassigns limits to be in data space:

ggplot2/R/scale-.r

Lines 494 to 500 in e2bdf85

    
           get_breaks = function(self, limits = self$get_limits()) { 
        
             if (self$is_empty()) { 
        
               return(numeric()) 
        
             } 
        
             # Limits in transformed space need to be converted back to data space 
        
             limits <- self$trans$inverse(limits)

thomasp85 · 2020-04-30T19:52:49Z

@paleolimbot has this (very old) issue been superseded by #3592?

It seems the root cause is always the expansion factor

paleolimbot · 2020-04-30T20:11:09Z

Computing breaks on the non-expanded range would fix this specific issue (IIRC), but I don't think breaks should be computed on the unexpanded range (sort of empty looking if I'm remembering right).

If you think any of these are good PR material as you're going through them, feel free to flag me and I will work one up.

thomasp85 · 2020-04-30T20:14:02Z

I won't be going through all the old ones that already have tags... just wanted to take trip down memory lane... I think it would be nice to clear all these old issues out at some point though, so if you have the time... :-)

paleolimbot · 2020-04-30T20:19:32Z

Cool...will wait for the patch release and will see what's the most useful way I could contribute after that.

teunbrand · 2020-10-15T07:33:29Z

I played around with an implementation like @paleolimbot suggested, wherein the domain is transformed and then used to squish the limits in the ScaleContinuous$get_breaks() method.

This fixes the sqrt_trans() scale problem and works great for other numerical scales too. However, because date_trans() and time_trans() domains are numeric c(-Inf, Inf), they'll throw errors like the one below:

x <- scales::date_trans()
x$transform(x$domain)
#> Error: Invalid input: date_trans works with objects of class Date only

^{Created on 2020-10-15 by the reprex package (v0.3.0)}

I've come across this error before in e.g. #4155, or in an extension zeehio/facetscales#22, or workarounds in ggplot's code:

ggplot2/R/scale-expansion.r

Lines 146 to 149 in 885c3c1

    
           # using the inverse transform to resolve the NA value is needed for date/datetime/time 
        
           # scales, which refuse to transform objects of the incorrect type 
        
           coord_limits <- coord_limits %||% scale$trans$inverse(c(NA_real_, NA_real_)) 
        
           coord_limits_scale <- scale$trans$transform(coord_limits)

This makes me think perhaps an issue should be filed in the scales package asking for more leniency on the date_trans()/time_trans() transformations.

yutannihilation · 2020-10-15T11:16:56Z

This makes me think perhaps an issue should be filed in the scales package asking for more leniency on the date_trans()/time_trans() transformations.

I too feel this needs to be fixed on scales' side. Possible fixes are

make domain a Date/POSIXct/hms class (it seems they all accept Inf, though they are not printable. c.f. https://twitter.com/yutannihilat_en/status/1316678502880428032)
make transform() accept numeric values

Probably, domain should have the same class as that of the trans, so fix 1 seems straightforward. On the other hand, fix 2 let us use Inf like the code below.

set.seed(10)
df <- data.frame(
  date = Sys.Date() + 1:2,
  price = 1:2
)

ggplot(df, aes(date, price)) +
  geom_line() +
  # annotate("point", x = as.Date(Inf), y = 1.5, size = 30) works
  annotate("point", x = Inf, y = 1.5, size = 30)

hadley mentioned this issue Jul 23, 2015

coord_trans() bug causes empty plot #1168

Closed

BrianDiggs mentioned this issue Jul 27, 2015

Clean up axis expansion #1207

Closed

hadley mentioned this issue Jan 2, 2017

no tick mark at y=0 with scale_y_sqrt() #1962

Closed

hadley added the feature a feature request or enhancement label Jan 25, 2017

hadley mentioned this issue Jan 27, 2017

problem with test for monotonous transformation (secondary axis)? #2008

Closed

paleolimbot added the scales 🐍 label Jun 18, 2019

This was referenced Jun 18, 2019

Single strategy for scale expansion/limiting that works with transformations #3371

Closed

Consolidate coordinate scale expansion and limiting code #3380

Merged

Square-root transforms happily inverse-transforms negative values r-lib/scales#214

Closed

paleolimbot added the tidy-dev-day 🤓 Tidyverse Developer Day label Jun 25, 2019

paleolimbot mentioned this issue Oct 27, 2019

Breaks should be computed on pre-expanded range #3592

Open

paleolimbot mentioned this issue Dec 20, 2019

Release ggplot2 3.3.0 #3674

Closed

18 tasks

clauswilke mentioned this issue Sep 6, 2020

Axis label on 0 disappears on scale_*_sqrt() #4193

Closed

teunbrand mentioned this issue Oct 15, 2020

Feature request: flexibility of date and time transformations w.r.t. numeric input. r-lib/scales#298

Closed

teunbrand mentioned this issue May 14, 2021

Zero point not included in scale_*_sqrt() #4478

Closed

teunbrand mentioned this issue Mar 24, 2022

Allow scale_{x/y}_sqrt() to have breaks at 0. #4775

Merged

thomasp85 closed this as completed in #4775 May 17, 2022

teunbrand mentioned this issue Jun 3, 2024

When x values contain 0, using coord_trans("sqrt") removes the axis line #5919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqrt_trans, scale limit expansion, and missing breaks #980

sqrt_trans, scale limit expansion, and missing breaks #980

BrianDiggs commented Jun 26, 2014

SamGG commented Aug 7, 2014

hadley commented Jun 12, 2015

BrianDiggs commented Jun 18, 2015

hadley commented Jun 19, 2015

BrianDiggs commented Jun 19, 2015

krlmlr commented Jul 28, 2015

BrianDiggs commented Jul 29, 2015

paleolimbot commented Jun 25, 2019

paleolimbot commented Jun 25, 2019

thomasp85 commented Apr 30, 2020

paleolimbot commented Apr 30, 2020

thomasp85 commented Apr 30, 2020

paleolimbot commented Apr 30, 2020

teunbrand commented Oct 15, 2020 •

edited

Loading

yutannihilation commented Oct 15, 2020

sqrt_trans, scale limit expansion, and missing breaks #980

sqrt_trans, scale limit expansion, and missing breaks #980

Comments

BrianDiggs commented Jun 26, 2014

Summary

Reproducible example:

Expected result

Actual results

Discussion

Workarounds

Don't square negative values

Squish range before inverting

Squish to range whenever values are extended

SamGG commented Aug 7, 2014

hadley commented Jun 12, 2015

BrianDiggs commented Jun 18, 2015

hadley commented Jun 19, 2015

BrianDiggs commented Jun 19, 2015

krlmlr commented Jul 28, 2015

BrianDiggs commented Jul 29, 2015

paleolimbot commented Jun 25, 2019

paleolimbot commented Jun 25, 2019

thomasp85 commented Apr 30, 2020

paleolimbot commented Apr 30, 2020

thomasp85 commented Apr 30, 2020

paleolimbot commented Apr 30, 2020

teunbrand commented Oct 15, 2020 • edited Loading

yutannihilation commented Oct 15, 2020

teunbrand commented Oct 15, 2020 •

edited

Loading