Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #8785

Closed
wants to merge 1 commit into from

Conversation

tribbloid
Copy link

Connection properties are now deep copied before they are used by JDBC Drivers, this solvs all problems in unit tests

@@ -75,6 +76,19 @@ private[sql] object JDBCRelation {
}
ans.toArray
}

def getEffectiveProperties(
connectionProperties: Properties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here seems off (you may wish to run ./dev/lint-scala)

@JoshRosen
Copy link
Contributor

Hey, do you mind giving this PR a descriptive title? Makes the PR queue easier to scan.

@tribbloid tribbloid changed the title Spark 10625 Spark 10625: Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties Sep 17, 2015
@tribbloid tribbloid changed the title Spark 10625: Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties Spark-10625: Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties Sep 17, 2015
@tribbloid
Copy link
Author

Does it look any better now?

@@ -75,6 +76,19 @@ private[sql] object JDBCRelation {
}
ans.toArray
}

def getEffectiveProperties(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indentation is still a little funky, see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tribbloid I think Holden's comment still stands -- see how other methods wrap args. I also don't think you need to fully-qualify scala.collection.Map here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on Sean's comments. Also, could you add a one- or two-line comment to explain what's going on here? Maybe give this method Scaladoc?

@holdenk
Copy link
Contributor

holdenk commented Sep 18, 2015

Looks like good improvements, less duplicated code, although there are still some minor style issues. You might want to merge in the latest master branch so that tests can be run.
Also as a note: to help relevant reviewers find your PR faster, https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest has some steps on how to name the PR (essentially [SPARK-ISSUENUMBER][COMPONENT] - description) in this case maybe something like starting the PR's title with [SPARK-10625][SQL] will help the PR be visible to the people most able to help with the review.

@tribbloid
Copy link
Author

all fixed (except the PR, will fix soon), thanks a lot for pointing them out!

@tribbloid tribbloid changed the title Spark-10625: Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties [Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties Sep 22, 2015
@tribbloid
Copy link
Author

PR fixed as well, thanks a lot Holden!

@@ -280,7 +275,8 @@ final class DataFrameWriter private[sql](df: DataFrame) {
conn.close()
}

JdbcUtils.saveTable(df, url, table, props)
val props2 = JDBCRelation.getEffectiveProperties(connectionProperties, this.extraOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be created twice? isn't props the same?

@srowen
Copy link
Member

srowen commented Nov 27, 2015

I guess I'm missing why a deep copy solves a problem of an unserializable property value. It still exists in the copy right?

@tribbloid
Copy link
Author

Hi Sean, the property value can only be mutated by JDBC driver's function to fetch the schema on Spark driver, after which it's no longer serializable. My deep copy is made before the mutation

import org.apache.spark.util.Utils
import org.scalatest.BeforeAndAfter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: most of the import changes in this PR are actually the wrong way; can you restore the ordering?

@srowen
Copy link
Member

srowen commented Dec 5, 2015

@tribbloid are you still working on this? I had an outstanding question or two here

@tribbloid
Copy link
Author

@srowen yeah, I'll reply shortly, sorry wasn't aware of the email for a week

@tribbloid tribbloid force-pushed the SPARK-10625 branch 2 times, most recently from 7f5df5e to 267afca Compare December 7, 2015 23:12
@tribbloid
Copy link
Author

@srowen thanks a lot for posting the problem in import declarations, I've already correct it and won't optimize import habitually.

@tribbloid
Copy link
Author

@srowen for you second question: the connection property is deep copied twice to ensure that the original object is immutable. Reverting it breaks the scenario where a property is used in 2 JDBC write in short sequence. I haven't include this scenario into unit test yet, do you prefer me doing it? or this is not expected?

@srowen
Copy link
Member

srowen commented Dec 7, 2015

I don't get that -- what does the original object matter if it's copied here? and how would the copy change?

@tribbloid
Copy link
Author

@srowen Sorry you are right, there is already a deep copy and I should just use that, will correct immediately


val oldDrivers = DriverManager.getDrivers.asScala.filter(_.acceptsURL("jdbc:h2:"))
oldDrivers.foreach(DriverManager.deregisterDriver)
DriverManager.registerDriver(UnserializableH2Driver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be inside of the try block, no?

@tribbloid
Copy link
Author

Good! Boots on the ground.

@tribbloid
Copy link
Author

Rebased with minimal changes to code style (and reverted whitespace changes as well as correcting import order). All tests passed. Let's finish this

@JoshRosen
Copy link
Contributor

Hey, there's still a bunch of review comments that I left which haven't been acknowledged or addressed! Mind replying to them?

@tribbloid tribbloid force-pushed the SPARK-10625 branch 2 times, most recently from 7a80214 to 7bae97c Compare January 27, 2016 19:28
@tribbloid
Copy link
Author

Yes, all comments resolved.

@tribbloid
Copy link
Author

Hi Josh, do you see any problem? The Jenkins should have it tested and its clean to be merged be now

@@ -27,6 +27,8 @@ import org.apache.spark.sql.{DataFrame, Row, SaveMode, SQLContext}
import org.apache.spark.sql.sources._
import org.apache.spark.sql.types.StructType

import scala.collection.JavaConverters._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tribbloid
Copy link
Author

Finally someone replied :)
Do you suggest me to fix it now? Or wait until 2.0 RC code has become a bit more stable (but not frozen)?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tribbloid
Copy link
Author

Please wait for me to address the conflicts, will do this after 2.0.0 preview main component become stable enough

@holdenk
Copy link
Contributor

holdenk commented Jul 6, 2016

So 2.0.0-preview is already out and we are in RC2 so I wouldn't expect any big changes happening right now if you want to take the time to update the PR :)

@tribbloid
Copy link
Author

Absolutely lady, & welcome to Toronto.

On 2016-07-06 04:56 PM, Holden Karau wrote:

So 2.0.0-preview is already out and we are in RC2 so I wouldn't expect
any big changes happening right now if you want to take the time to
update the PR :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8785 (comment), or
mute the thread
https://github.com/notifications/unsubscribe/ADSDBUFfM3_zycYhk0ryFSRPLrNJWhqIks5qTBZrgaJpZM4F-xmq.

@tribbloid
Copy link
Author

All conflict fixed with minimal changes to original patch that has been peer-reviewed in Jan 2016. Request for merging.

WARNING: DataFrameWriter Line 402 (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L402)

can use JDBCRelation.getEffectiveProperties Line106
(https://github.com/apache/spark/pull/8785/files#diff-5f0d0643fcfad315df0fdd7cae52dfaeR106)

but I didn't change it to minimize diff. Please advise if it has to be corrected.

Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

add one more unit test
fix JDBCRelation & DataFrameWriter to pass all tests

revise scala style

put driver replacement code into a shared function

fix styling

upgrade to master and resolve all related issues

remove the useless second deep copy of properties
rename test names to be more explicit

minor refactoring based on Sean's suggestion

move JavaConverters import to under object

remove the redundant toSeq and pull up lines in brackets

improve styling in UnserializableDriverHelper and JDBCRelation

remove whitespace in JDBCRelation line 42

add back type qualifiers of parameter of getEffectiveProperties into JDBCRelation to allow mutable Map being used.

fix a unit test error: DriverManager.getDrivers.asScala returns an iterator that can only be iterated once, this commit cast it into a list to be reusable

reformat import & styling
fix a API invocation errors
remove several getEffectiveProperties invocations as deep copies are already implemented in some functions.
change test name to start with "SPARK-10625"
@rxin
Copy link
Contributor

rxin commented Dec 7, 2016

@tribbloid is this a problem that needs to be fixed?

@tribbloid
Copy link
Author

tribbloid commented Dec 7, 2016 via email

@srowen
Copy link
Member

srowen commented Dec 7, 2016

This needs a rebase and there are still outstanding review comments (minor ones)

@HyukjinKwon
Copy link
Member

ping @tribbloid. Are you able to proceed the review comments? If not, it'd be better closed for now.

@tribbloid
Copy link
Author

tribbloid commented Feb 9, 2017

@HyukjinKwon yeah, just need to rebase on 2.2-SNAPSHOT+
are you going to merge immediately after rebase + syntax validation + full unit test? If it drags for too long the patch becomes obsolete.

@HyukjinKwon
Copy link
Member

I am not supposed to decide what to merge but I left the command as I just found this seems not active to the review comments and I assumed that this PR is currently abandoned which the author happened to be not able to proceed further for now.

I'd rebase/address the review comments and keep pinging the related guys here.

srowen added a commit to srowen/spark that referenced this pull request Mar 22, 2017
@srowen srowen mentioned this pull request Mar 22, 2017
@asfgit asfgit closed this in b70c03a Mar 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants