Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12526][SPARKR]ifelse, when, otherwise unable to take Column as value #10481

Closed
wants to merge 3 commits into from

Conversation

saurfang
Copy link
Contributor

ifelse, when, otherwise is unable to take Column typed S4 object as values.

For example:

ifelse(lit(1) == lit(1), lit(2), lit(3))
ifelse(df$mpg > 0, df$mpg, 0)

will both fail with

attempt to replicate an object of type 'environment'

The PR replaces ifelse calls with if ... else ... inside the function implementations to avoid attempt to vectorize(i.e. rep()). It remains to be discussed whether we should instead support vectorization in these functions for consistency because ifelse in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.

For reference, added test cases which trigger failures:

. Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
error in evaluating the argument 'x' in selecting a method for function 'collect': 
  error in evaluating the argument 'col' in selecting a method for function 'select': 
  attempt to replicate an object of type 'environment'
Calls: when -> when -> ifelse -> ifelse

1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
6: condition(object)
7: compare(actual, expected, ...)
8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
Error: Test failures
Execution halted

@SparkQA
Copy link

SparkQA commented Dec 26, 2015

Test build #48336 has finished for PR 10481 at commit 449b0f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Thanks @saurfang for the PR. I think this is a good enough fix as this is in an internal implementation detail and not visible to SparkR users ?

cc @sun-rui

@saurfang
Copy link
Contributor Author

Indeed. Like I said I don't see compelling reason to use these three functions in vectorized way. Let me know if you have any other comments on the fix.

@sun-rui
Copy link
Contributor

sun-rui commented Dec 28, 2015

The fix is good, but some style nit:
if (...) { ... } else { ... }

@@ -225,7 +225,7 @@ setMethod("%in%",
setMethod("otherwise",
signature(x = "Column", value = "ANY"),
function(x, value) {
value <- ifelse(class(value) == "Column", value@jc, value)
value <- if(class(value) == "Column") { value@jc } else { value }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between if and (

@saurfang
Copy link
Contributor Author

Thanks for the review @sun-rui. Hope that's better. Looks like lintr, as awesome as it is, let that slip through, which I have filed a separate issue here: r-lib/lintr#128

@SparkQA
Copy link

SparkQA commented Dec 29, 2015

Test build #48395 has finished for PR 10481 at commit 052cb51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

LGTM. Thanks @saurfang and @sun-rui - Merging this

asfgit pushed a commit that referenced this pull request Dec 29, 2015
…umn as value

`ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.

For example:
```r
ifelse(lit(1) == lit(1), lit(2), lit(3))
ifelse(df$mpg > 0, df$mpg, 0)
```
will both fail with
```r
attempt to replicate an object of type 'environment'
```

The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.

For reference, added test cases which trigger failures:
```r
. Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
error in evaluating the argument 'x' in selecting a method for function 'collect':
  error in evaluating the argument 'col' in selecting a method for function 'select':
  attempt to replicate an object of type 'environment'
Calls: when -> when -> ifelse -> ifelse

1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
6: condition(object)
7: compare(actual, expected, ...)
8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
Error: Test failures
Execution halted
```

Author: Forest Fang <[email protected]>

Closes #10481 from saurfang/spark-12526.

(cherry picked from commit d80cc90)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in d80cc90 Dec 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants