[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #8785

tribbloid · 2015-09-16T22:38:20Z

Connection properties are now deep copied before they are used by JDBC Drivers, this solvs all problems in unit tests

holdenk · 2015-09-17T00:42:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

@@ -75,6 +76,19 @@ private[sql] object JDBCRelation {
    }
    ans.toArray
  }
+
+  def getEffectiveProperties(
+                                        connectionProperties: Properties,


The indentation here seems off (you may wish to run ./dev/lint-scala)

JoshRosen · 2015-09-17T00:55:41Z

Hey, do you mind giving this PR a descriptive title? Makes the PR queue easier to scan.

tribbloid · 2015-09-17T15:01:38Z

Does it look any better now?

holdenk · 2015-09-18T18:36:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

@@ -75,6 +76,19 @@ private[sql] object JDBCRelation {
    }
    ans.toArray
  }
+
+  def getEffectiveProperties(


this indentation is still a little funky, see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

@tribbloid I think Holden's comment still stands -- see how other methods wrap args. I also don't think you need to fully-qualify scala.collection.Map here?

+1 on Sean's comments. Also, could you add a one- or two-line comment to explain what's going on here? Maybe give this method Scaladoc?

holdenk · 2015-09-18T18:40:48Z

Looks like good improvements, less duplicated code, although there are still some minor style issues. You might want to merge in the latest master branch so that tests can be run.
Also as a note: to help relevant reviewers find your PR faster, https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest has some steps on how to name the PR (essentially [SPARK-ISSUENUMBER][COMPONENT] - description) in this case maybe something like starting the PR's title with [SPARK-10625][SQL] will help the PR be visible to the people most able to help with the review.

tribbloid · 2015-09-18T23:57:02Z

all fixed (except the PR, will fix soon), thanks a lot for pointing them out!

tribbloid · 2015-09-22T14:32:39Z

PR fixed as well, thanks a lot Holden!

srowen · 2015-11-27T12:12:43Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

@@ -280,7 +275,8 @@ final class DataFrameWriter private[sql](df: DataFrame) {
      conn.close()
    }

-    JdbcUtils.saveTable(df, url, table, props)
+    val props2 = JDBCRelation.getEffectiveProperties(connectionProperties, this.extraOptions)


Why does this need to be created twice? isn't props the same?

srowen · 2015-11-27T12:15:02Z

I guess I'm missing why a deep copy solves a problem of an unserializable property value. It still exists in the copy right?

tribbloid · 2015-11-27T15:38:36Z

Hi Sean, the property value can only be mutated by JDBC driver's function to fetch the schema on Spark driver, after which it's no longer serializable. My deep copy is made before the mutation

srowen · 2015-11-30T09:37:49Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala

 import org.apache.spark.util.Utils
+import org.scalatest.BeforeAndAfter


Nit: most of the import changes in this PR are actually the wrong way; can you restore the ordering?

srowen · 2015-12-05T15:54:24Z

@tribbloid are you still working on this? I had an outstanding question or two here

tribbloid · 2015-12-07T17:46:07Z

@srowen yeah, I'll reply shortly, sorry wasn't aware of the email for a week

tribbloid · 2015-12-07T23:14:37Z

@srowen thanks a lot for posting the problem in import declarations, I've already correct it and won't optimize import habitually.

tribbloid · 2015-12-07T23:23:41Z

@srowen for you second question: the connection property is deep copied twice to ensure that the original object is immutable. Reverting it breaks the scenario where a property is used in 2 JDBC write in short sequence. I haven't include this scenario into unit test yet, do you prefer me doing it? or this is not expected?

srowen · 2015-12-07T23:34:50Z

I don't get that -- what does the original object matter if it's copied here? and how would the copy change?

tribbloid · 2015-12-10T01:15:23Z

@srowen Sorry you are right, there is already a deep copy and I should just use that, will correct immediately

JoshRosen · 2016-01-04T09:48:29Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/UnserializableDriverHelper.scala

+
+    val oldDrivers = DriverManager.getDrivers.asScala.filter(_.acceptsURL("jdbc:h2:"))
+    oldDrivers.foreach(DriverManager.deregisterDriver)
+    DriverManager.registerDriver(UnserializableH2Driver)


Should probably be inside of the try block, no?

tribbloid · 2016-01-25T23:41:05Z

Good! Boots on the ground.

tribbloid · 2016-01-25T23:42:16Z

Rebased with minimal changes to code style (and reverted whitespace changes as well as correcting import order). All tests passed. Let's finish this

JoshRosen · 2016-01-26T20:05:40Z

Hey, there's still a bunch of review comments that I left which haven't been acknowledged or addressed! Mind replying to them?

tribbloid · 2016-01-27T19:44:28Z

Yes, all comments resolved.

tribbloid · 2016-02-02T21:08:36Z

Hi Josh, do you see any problem? The Jenkins should have it tested and its clean to be merged be now

clockfly · 2016-06-14T07:20:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

@@ -27,6 +27,8 @@ import org.apache.spark.sql.{DataFrame, Row, SaveMode, SQLContext}
 import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types.StructType

+import scala.collection.JavaConverters._


scala import should be grouped together with line 22.
https://github.com/apache/spark/pull/8785/files#diff-5f0d0643fcfad315df0fdd7cae52dfaeR22

tribbloid · 2016-06-14T16:44:44Z

Finally someone replied :)
Do you suggest me to fix it now? Or wait until 2.0 RC code has become a bit more stable (but not frozen)?

AmplabJenkins · 2016-06-30T07:57:19Z

Can one of the admins verify this patch?

tribbloid · 2016-07-01T00:51:51Z

Please wait for me to address the conflicts, will do this after 2.0.0 preview main component become stable enough

holdenk · 2016-07-06T20:55:15Z

So 2.0.0-preview is already out and we are in RC2 so I wouldn't expect any big changes happening right now if you want to take the time to update the PR :)

tribbloid · 2016-07-06T21:29:08Z

Absolutely lady, & welcome to Toronto.

On 2016-07-06 04:56 PM, Holden Karau wrote:

So 2.0.0-preview is already out and we are in RC2 so I wouldn't expect
any big changes happening right now if you want to take the time to
update the PR :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8785 (comment), or
mute the thread
https://github.com/notifications/unsubscribe/ADSDBUFfM3_zycYhk0ryFSRPLrNJWhqIks5qTBZrgaJpZM4F-xmq.

tribbloid · 2016-07-09T19:37:44Z

All conflict fixed with minimal changes to original patch that has been peer-reviewed in Jan 2016. Request for merging.

WARNING: DataFrameWriter Line 402 (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L402)

can use JDBCRelation.getEffectiveProperties Line106
(https://github.com/apache/spark/pull/8785/files#diff-5f0d0643fcfad315df0fdd7cae52dfaeR106)

but I didn't change it to minimize diff. Please advise if it has to be corrected.

Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties add one more unit test fix JDBCRelation & DataFrameWriter to pass all tests revise scala style put driver replacement code into a shared function fix styling upgrade to master and resolve all related issues remove the useless second deep copy of properties rename test names to be more explicit minor refactoring based on Sean's suggestion move JavaConverters import to under object remove the redundant toSeq and pull up lines in brackets improve styling in UnserializableDriverHelper and JDBCRelation remove whitespace in JDBCRelation line 42 add back type qualifiers of parameter of getEffectiveProperties into JDBCRelation to allow mutable Map being used. fix a unit test error: DriverManager.getDrivers.asScala returns an iterator that can only be iterated once, this commit cast it into a list to be reusable reformat import & styling fix a API invocation errors remove several getEffectiveProperties invocations as deep copies are already implemented in some functions. change test name to start with "SPARK-10625"

rxin · 2016-12-07T05:04:07Z

@tribbloid is this a problem that needs to be fixed?

tribbloid · 2016-12-07T21:03:31Z

Hi Reynold, Yes, by 1.6.2 it made some of the JDBC drivers (notably the one for SAP HANA) to malfunction. The fix is easy, though I haven't test if its already fixed in 2.0+. Yours Peng

…

On 2016-12-07 12:04 AM, Reynold Xin wrote: @tribbloid <https://github.com/tribbloid> is this a problem that needs to be fixed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8785 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADSDBd9p7cLc8MmfVv9-6WBhcwHpsz-2ks5rFj5cgaJpZM4F-xmq>.

srowen · 2016-12-07T21:07:00Z

This needs a rebase and there are still outstanding review comments (minor ones)

HyukjinKwon · 2017-02-09T12:36:32Z

ping @tribbloid. Are you able to proceed the review comments? If not, it'd be better closed for now.

tribbloid · 2017-02-09T18:10:20Z

@HyukjinKwon yeah, just need to rebase on 2.2-SNAPSHOT+
are you going to merge immediately after rebase + syntax validation + full unit test? If it drags for too long the patch becomes obsolete.

HyukjinKwon · 2017-02-11T12:12:52Z

I am not supposed to decide what to merge but I left the command as I just found this seems not active to the review comments and I assumed that this PR is currently abandoned which the author happened to be not able to proceed further for now.

I'd rebase/address the review comments and keep pinging the related guys here.

Closes apache#16819 Closes apache#13467 Closes apache#16083 Closes apache#17135 Closes apache#8785 Closes apache#16278 Closes apache#16997 Closes apache#17073 Closes apache#17220

holdenk reviewed Sep 17, 2015
View reviewed changes

tribbloid changed the title ~~Spark 10625~~ Spark 10625: Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties Sep 17, 2015

holdenk reviewed Sep 18, 2015
View reviewed changes

tribbloid force-pushed the SPARK-10625 branch from 4a91072 to f976de0 Compare September 18, 2015 23:55

srowen reviewed Nov 27, 2015
View reviewed changes

tribbloid force-pushed the SPARK-10625 branch from f976de0 to 17f01d2 Compare November 27, 2015 17:34

srowen mentioned this pull request Nov 28, 2015

[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #10020

Closed

srowen reviewed Nov 30, 2015
View reviewed changes

tribbloid force-pushed the SPARK-10625 branch 2 times, most recently from 7f5df5e to 267afca Compare December 7, 2015 23:12

JoshRosen reviewed Jan 4, 2016
View reviewed changes

tribbloid force-pushed the SPARK-10625 branch from db957e4 to 91cf81c Compare January 25, 2016 23:38

tribbloid force-pushed the SPARK-10625 branch 2 times, most recently from 7a80214 to 7bae97c Compare January 27, 2016 19:28

clockfly reviewed Jun 14, 2016
View reviewed changes

tribbloid force-pushed the SPARK-10625 branch from 7bae97c to f204100 Compare July 9, 2016 19:24

tribbloid force-pushed the SPARK-10625 branch from f204100 to 996dc69 Compare July 24, 2016 19:54

srowen added a commit to srowen/spark that referenced this pull request Mar 22, 2017

Close stale PRs.

d88bc61

Closes apache#16819 Closes apache#13467 Closes apache#16083 Closes apache#17135 Closes apache#8785 Closes apache#16278 Closes apache#16997 Closes apache#17073 Closes apache#17220

srowen mentioned this pull request Mar 22, 2017

[INFRA] Close stale PRs #17386

Closed

asfgit closed this in b70c03a Mar 23, 2017

		import org.apache.spark.util.Utils
		import org.scalatest.BeforeAndAfter

[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #8785

[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #8785

Conversation

tribbloid commented Sep 16, 2015

Choose a reason for hiding this comment

JoshRosen commented Sep 17, 2015

tribbloid commented Sep 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holdenk commented Sep 18, 2015

tribbloid commented Sep 18, 2015

tribbloid commented Sep 22, 2015

Choose a reason for hiding this comment

srowen commented Nov 27, 2015

tribbloid commented Nov 27, 2015

Choose a reason for hiding this comment

srowen commented Dec 5, 2015

tribbloid commented Dec 7, 2015

tribbloid commented Dec 7, 2015

tribbloid commented Dec 7, 2015

srowen commented Dec 7, 2015

tribbloid commented Dec 10, 2015

Choose a reason for hiding this comment

tribbloid commented Jan 25, 2016

tribbloid commented Jan 25, 2016

JoshRosen commented Jan 26, 2016

tribbloid commented Jan 27, 2016

tribbloid commented Feb 2, 2016

Choose a reason for hiding this comment

tribbloid commented Jun 14, 2016

AmplabJenkins commented Jun 30, 2016

tribbloid commented Jul 1, 2016

holdenk commented Jul 6, 2016

tribbloid commented Jul 6, 2016

tribbloid commented Jul 9, 2016

rxin commented Dec 7, 2016

tribbloid commented Dec 7, 2016 via email

srowen commented Dec 7, 2016

HyukjinKwon commented Feb 9, 2017

tribbloid commented Feb 9, 2017 • edited Loading

HyukjinKwon commented Feb 11, 2017

tribbloid commented Feb 9, 2017 •

edited

Loading