Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility of sketch with magrittr 2.0 #5

Closed
lionel- opened this issue Nov 16, 2020 · 5 comments
Closed

Compatibility of sketch with magrittr 2.0 #5

lionel- opened this issue Nov 16, 2020 · 5 comments

Comments

@lionel-
Copy link

lionel- commented Nov 16, 2020

Hello,

I'm seeing in reverse dependency checks on the cloud that sketch fails with magrittr 2.0.

    checking examples ... ERROR

    Running examples in ‘sketch-Ex.R’ failed
    The error most likely occurred in:

    > ### Name: compile_r
    > ### Title: Compile an R file into a JavaScript file
    > ### Aliases: compile_r
    > 
    > ### ** Examples
    > 
    > file <- system.file("test_files/test_source.R", package = "sketch")
    > readLines(file)
    [1] "console::log(\"'test_source.R' runs successfully.\")"
    > compile_r(input = file)
    Error: C stack usage  7969652 is too close to the limit
    Execution halted

    checking tests ... ERROR

      Running ‘testthat.R’
    Running the tests in ‘tests/testthat.R’ failed.
    Complete output:
      > library(testthat)
      > library(sketch)
      > 
      > test_check("sketch")
      Error: C stack usage  7970324 is too close to the limit
      Execution halted

Unfortunately I can't reproduce locally. Could you take a look please? The notes about compatibility with magrittr 2.0 in https://github.com/tidyverse/magrittr/blob/master/NEWS.md might be useful.

@kcf-jackson
Copy link
Owner

Thanks for the notice.

I managed to reproduce the error. It seems the problem is caused by the switch of the implementation of magrittr::freduce from a for-loop in version 1.5

freduce <- function(value, function_list)
{
  k <- length(function_list)
  if (k > 1) {
    for (i in 1:(k - 1L)) {
      value <- function_list[[i]](value)  
    }
  }
  value <- withVisible(function_list[[k]](value))
  if (value[["visible"]]) value[["value"]] else invisible(value[["value"]])
}

to a recursive call in version 2.0.1

freduce <- function(value, function_list)
{
  if (length(function_list) == 1L)
    function_list[[1L]](value)
  else 
    Recall(function_list[[1L]](value), function_list[-1L])
}

Under the new implementation, if the function list is too long, then the memory stack is exhausted.

Reproducible example

The following is the reproducible example. You may need to use a higher number for the parameter n to hit the error.

# Substitute character 'a' by character 'b'
a_to_b <- function(x) {
    gsub(pattern = "a", replacement = "b", x)
}

# Repeat a function n times
rep_func <- function(f, n) {
    flist <- list()
    for (i in 1:n) {
        flist[[i]] <- a_to_b
    }
    return(flist)
}

flist <- rep_func(a_to_b, 300)
magrittr::freduce("abcde", flist)
# [1] "bbcde"

flist <- rep_func(a_to_b, 400)
magrittr::freduce("abcde", flist)
# Error: C stack usage  7969648 is too close to the limit

Benchmarking

Here is some further benchmarking. Version 1.5 can handle a list over 100K, while version 2.0.1 hits the limit at 377 with my computer.

# "magrittr" version 1.5
# install.packages("magrittr")
> packageVersion("magrittr")
[1] ‘1.5’
> flist <- rep_func(a_to_b, 100000)
> magrittr::freduce("abcde", flist)
[1] "bbcde"


# "magrittr" version 2.0.1
# remotes::install_github("tidyverse/magrittr")
> packageVersion("magrittr")
[1] ‘2.0.1’
> flist <- rep_func(a_to_b, 376)
> magrittr::freduce("abcde", flist)
[1] "bbcde"
> flist <- rep_func(a_to_b, 377)
> magrittr::freduce("abcde", flist)
Error: C stack usage  7970704 is too close to the limit

Thanks.

@lionel-
Copy link
Author

lionel- commented Nov 17, 2020

Hmm, I might just roll back to the old implementation. What do you think @smbache?

@smbache
Copy link

smbache commented Nov 17, 2020

I seem to remember that I used recursion as well initially, but changed to a loop. Could be wrong. But could be either something similar, or speed that made me change it (as I generally try to avoid loops unless there's a good reason).

lionel- added a commit to tidyverse/magrittr that referenced this issue Nov 17, 2020
@lionel-
Copy link
Author

lionel- commented Nov 17, 2020

Should be fixed, thanks for the quick and thorough feedback @kcf-jackson!

@lionel- lionel- closed this as completed Nov 17, 2020
@smbache
Copy link

smbache commented Nov 17, 2020

Can't seem to consolidate the git history with my memory, but in any case I think it's fine to roll back.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 14, 2020
# magrittr 2.0.1

* Fixed issue caused by objects with certain names being present in
  the calling environment (#233).

* Fixed regression in `freduce()` with long lists (kcf-jackson/sketch#5).


# magrittr 2.0.0

## Fast and lean implementation of the pipe

The pipe has been rewritten in C with the following goals in mind:

- Minimal performance cost.
- Minimal impact on backtraces.
- No impact on reference counts.

As part of this rewrite we have changed the behaviour of the pipe to
make it closer to the implementation that will likely be included in a
future version of R. The pipe now evaluates piped expressions lazily (#120).
The main consequence of this change is that warnings and errors can
now be handled by trailing pipe calls:

```r
stop("foo") %>% try()
warning("bar") %>% suppressWarnings()
```


## Breaking changes

The pipe rewrite should generally not affect your code. We have
checked magrittr on 2800 CRAN packages and found only a dozen of
failures. The development version of magrittr has been advertised on
social media for a 3 months trial period, and no major issues were
reported. However, there are some corner cases that might require
updating your code. Below is a report of the backward
incompatibilities we found in real code to help you transition, should
you find an issue in your code.


### Behaviour of `return()` in a pipeline

In previous versions of magrittr, the behaviour of `return()` within
pipe expressions was undefined. Should it return from the current pipe
expression, from the whole pipeline, or from the enclosing function?
The behaviour that makes the most sense is to return from the
enclosing function. However, we can't make this work easily with the
new implementation, and so calling `return()` is now an error.

```r
my_function <- function(x) {
  x %>% {
    if (.) return("true")
    "false"
  }
}

my_function(TRUE)
#> Error: no function to return from, jumping to top level
```

In magrittr 1.5, `return()` used to return from the current pipe
expression. You can rewrite this to the equivalent:

```r
my_function <- function(x) {
  x %>% {
    if (.) {
      "true"
    } else {
      "false"
    }
  }
}

my_function(TRUE)
#> [1] "true"
```

For backward-compatibility we have special-cased trailing `return()`
calls as this is a common occurrence in packages:

```r
1 %>% identity() %>% return()
```

Note however that this only returns from the pipeline, not the
enclosing function (which is the historical behaviour):

```r
my_function <- function() {
  "value" %>% identity() %>% return()
  "wrong value"
}

my_function()
#> [1] "wrong value"
```

It is generally best to avoid using `return()` in a pipeline, even if
trailing.


### Failures caused by laziness

With the new lazy model for the evaluation of pipe expressions,
earlier parts of a pipeline are not yet evaluated when the last pipe
expression is called. They only get evaluated when the last function
actually uses the piped arguments:

```r
ignore <- function(x) "return value"
stop("never called") %>% ignore()
#> [1] "return value"
```

This should generally not cause problems. However we found some
functions with special behaviour, written under the assumption that
earlier parts of the pipeline were already evaluated and had already
produced side effects. This is generally incorrect behaviour because
that means that these functions do not work properly when called
with the nested form, e.g. `f(g(1))` instead of `1 %>% g() %>% f()`.

The solution to fix this is to call `force()` on the inputs to force
evaluation, and only then check for side effects:

```r
my_function <- function(data) {
  force(data)
  peek_side_effect()
}
```

Another issue caused by laziness is that if any function in a pipeline
returns invisibly, than the whole pipeline returns invisibly as well.

```r
1 %>% identity() %>% invisible()
1 %>% invisible() %>% identity()
1 %>% identity() %>% invisible() %>% identity()
```

This is consistent with the equivalent nested code. This behaviour can
be worked around in two ways. You can force visibility by wrapping the
pipeline in parentheses:

```r
my_function <- function(x) {
  (x %>% invisible() %>% identity())
}
```

Or by assigning the result to a variable and return it:

```r
my_function <- function(x) {
  out <- x %>% invisible() %>% identity()
  out
}
```


### Incorrect call stack introspection

The magrittr expressions are no longer evaluated in frames that can be
inspected by `sys.frames()` or `sys.parent()`. Using these functions
for implementing actual functionality (as opposed as debugging tools)
is likely to produce bugs. Instead, you should generally use
`parent.frame()` which works even when R code is called from
non-inspectable frames. This happens with e.g. `do.call()` and the new
C implementation of magrittr.


### Incorrect assumptions about magrittr internals

Some packages were depending on how magrittr was internally
structured. Robust code should only use the documented and exported
API of other packages.


## Bug fixes

* Can now use the placeholder `.` with the splicing operator `!!!`
  from rlang (#191).

* Piped arguments are now persistent. They can be evaluated after the
  pipeline has returned, which fixes subtle issues with function
  factories (#159, #195).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants