-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17838][SparkR] Check named arguments for options and use formatted R friendly message from JVM exception message #15608
Conversation
…e from JVM exception message
cc @felixcheung I recall we talked about this before. I first wanted to handle all the argument type checking but I just decided to do what I am pretty sure of (I remember we were concerned of sweeping). Could you please take a look? |
Test build #67440 has finished for PR 15608 at commit
|
env[[name]] <- as.character(value) | ||
nameList <- names(pairs) | ||
ignoredNames <- list() | ||
i <- 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is i
needed here?
could you elaborate on changes in this file? why do we need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sure. Actually, it took me a while to find out a clean way but it ended up with introducing another variable here...
value <- pairs[[name]]
is failed when name
is empty string (when they are non-named arguments), producing the error as below:
Error in env[[name]] <- value :
zero-length variable name
I wanted to access to that problematic values to print in
ignoredNames <- append(ignoredNames, pairs[i])
...
warning(paste0("Non-named arguments ignored: ", paste(ignoredNames, collapse = ", "), "."),
call. = FALSE)
To cut it short, I introduced that variable to access to the value when name
is empty. Maybe there should be another cleaver way but I couldn't come up with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, one possible alternative is to turn the list into a sequence of its indices but use lapply
instead of for
(which is more R-like)
lapply(seq_along(x), function(i) paste(names(x)[[i]], x[[i]]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks nicer. Let me try that!
Thanks @HyukjinKwon This is definitely good checks to have. Calls to read.* and write.* are not easily checked for parameters (file paths are hard) so to me it is better to leave the checking to JVM and just handle the exception better. I wouldn't close this JIRA though but it is covering a broader case - those we could check parameters before passing to JVM? Would you like to keep this JIRA open after this PR, or open a separate JIRA for this specific class of handling? One case I'm not sure about, is what Reader/Writer does with unnamed parameter - should we optionally fail here on the R side, instead of just ignoring them (in R or JVM)? |
I am fine with leaving the JIRA open. I can definitely try to open followups. Otherwise, I also can convert the JIRA as a sub-task after introducing a parent JIRA. I will follow your lead. (If you close the JIRA then I will open another to make this as a sub-task. If you don't I will just try to make a followup in the future).
I was worried of this part too and It took me a while to build up my argument for this... My argument to convince myself was that currently we don't make this failed when unused arbitrary options are given (e.g. |
re: JIRA - I don't mind one way or the other - both proposals sound good to me.
a valid point. does it work properly when the Spark properties are set into a Java Properties class? |
I guess you meant when the options, for example, spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala Line 43 in 0c0ad43
(BTW, it seems throwing |
@felixcheung I just made it as a single for loop (slightly different with the suggested one though..). Could you please check it again? |
Test build #67570 has finished for PR 15608 at commit
|
LGTM. I ran through a few other cases and I think the omitted names are handled properly with this. |
Yup, it seems not even |
merged to master. |
…tted R friendly message from JVM exception message ## What changes were proposed in this pull request? This PR proposes to - improve the R-friendly error messages rather than raw JVM exception one. As `read.json`, `read.text`, `read.orc`, `read.parquet` and `read.jdbc` are executed in the same path with `read.df`, and `write.json`, `write.text`, `write.orc`, `write.parquet` and `write.jdbc` shares the same path with `write.df`, it seems it is safe to call `handledCallJMethod` to handle JVM messages. - prevent `zero-length variable name` and prints the ignored options as an warning message. **Before** ``` r > read.json("path", a = 1, 2, 3, "a") Error in env[[name]] <- value : zero-length variable name ``` ``` r > read.json("arbitrary_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Path does not exist: file:/...; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:398) ... > read.orc("arbitrary_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Path does not exist: file:/...; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:398) ... > read.text("arbitrary_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Path does not exist: file:/...; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:398) ... > read.parquet("arbitrary_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Path does not exist: file:/...; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:398) ... ``` ``` r > write.json(df, "existing_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: path file:/... already exists.; at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:68) > write.orc(df, "existing_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: path file:/... already exists.; at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:68) > write.text(df, "existing_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: path file:/... already exists.; at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:68) > write.parquet(df, "existing_path") Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: path file:/... already exists.; at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:68) ``` **After** ``` r read.json("arbitrary_path", a = 1, 2, 3, "a") Unnamed arguments ignored: 2, 3, a. ``` ``` r > read.json("arbitrary_path") Error in json : analysis error - Path does not exist: file:/... > read.orc("arbitrary_path") Error in orc : analysis error - Path does not exist: file:/... > read.text("arbitrary_path") Error in text : analysis error - Path does not exist: file:/... > read.parquet("arbitrary_path") Error in parquet : analysis error - Path does not exist: file:/... ``` ``` r > write.json(df, "existing_path") Error in json : analysis error - path file:/... already exists.; > write.orc(df, "existing_path") Error in orc : analysis error - path file:/... already exists.; > write.text(df, "existing_path") Error in text : analysis error - path file:/... already exists.; > write.parquet(df, "existing_path") Error in parquet : analysis error - path file:/... already exists.; ``` ## How was this patch tested? Unit tests in `test_utils.R` and `test_sparkSQL.R`. Author: hyukjinkwon <[email protected]> Closes apache#15608 from HyukjinKwon/SPARK-17838.
What changes were proposed in this pull request?
This PR proposes to
improve the R-friendly error messages rather than raw JVM exception one.
As
read.json
,read.text
,read.orc
,read.parquet
andread.jdbc
are executed in the same path withread.df
, andwrite.json
,write.text
,write.orc
,write.parquet
andwrite.jdbc
shares the same path withwrite.df
, it seems it is safe to callhandledCallJMethod
to handleJVM messages.
prevent
zero-length variable name
and prints the ignored options as an warning message.Before
After
How was this patch tested?
Unit tests in
test_utils.R
andtest_sparkSQL.R
.