-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark-10625] [SQL] Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties #8785
Conversation
@@ -75,6 +76,19 @@ private[sql] object JDBCRelation { | |||
} | |||
ans.toArray | |||
} | |||
|
|||
def getEffectiveProperties( | |||
connectionProperties: Properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation here seems off (you may wish to run ./dev/lint-scala)
Hey, do you mind giving this PR a descriptive title? Makes the PR queue easier to scan. |
Does it look any better now? |
@@ -75,6 +76,19 @@ private[sql] object JDBCRelation { | |||
} | |||
ans.toArray | |||
} | |||
|
|||
def getEffectiveProperties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this indentation is still a little funky, see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tribbloid I think Holden's comment still stands -- see how other methods wrap args. I also don't think you need to fully-qualify scala.collection.Map
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on Sean's comments. Also, could you add a one- or two-line comment to explain what's going on here? Maybe give this method Scaladoc?
Looks like good improvements, less duplicated code, although there are still some minor style issues. You might want to merge in the latest master branch so that tests can be run. |
4a91072
to
f976de0
Compare
all fixed (except the PR, will fix soon), thanks a lot for pointing them out! |
PR fixed as well, thanks a lot Holden! |
@@ -280,7 +275,8 @@ final class DataFrameWriter private[sql](df: DataFrame) { | |||
conn.close() | |||
} | |||
|
|||
JdbcUtils.saveTable(df, url, table, props) | |||
val props2 = JDBCRelation.getEffectiveProperties(connectionProperties, this.extraOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be created twice? isn't props
the same?
I guess I'm missing why a deep copy solves a problem of an unserializable property value. It still exists in the copy right? |
Hi Sean, the property value can only be mutated by JDBC driver's function to fetch the schema on Spark driver, after which it's no longer serializable. My deep copy is made before the mutation |
f976de0
to
17f01d2
Compare
import org.apache.spark.util.Utils | ||
import org.scalatest.BeforeAndAfter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: most of the import changes in this PR are actually the wrong way; can you restore the ordering?
@tribbloid are you still working on this? I had an outstanding question or two here |
@srowen yeah, I'll reply shortly, sorry wasn't aware of the email for a week |
7f5df5e
to
267afca
Compare
@srowen thanks a lot for posting the problem in import declarations, I've already correct it and won't optimize import habitually. |
@srowen for you second question: the connection property is deep copied twice to ensure that the original object is immutable. Reverting it breaks the scenario where a property is used in 2 JDBC write in short sequence. I haven't include this scenario into unit test yet, do you prefer me doing it? or this is not expected? |
I don't get that -- what does the original object matter if it's copied here? and how would the copy change? |
@srowen Sorry you are right, there is already a deep copy and I should just use that, will correct immediately |
|
||
val oldDrivers = DriverManager.getDrivers.asScala.filter(_.acceptsURL("jdbc:h2:")) | ||
oldDrivers.foreach(DriverManager.deregisterDriver) | ||
DriverManager.registerDriver(UnserializableH2Driver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be inside of the try
block, no?
db957e4
to
91cf81c
Compare
Good! Boots on the ground. |
Rebased with minimal changes to code style (and reverted whitespace changes as well as correcting import order). All tests passed. Let's finish this |
Hey, there's still a bunch of review comments that I left which haven't been acknowledged or addressed! Mind replying to them? |
7a80214
to
7bae97c
Compare
Yes, all comments resolved. |
Hi Josh, do you see any problem? The Jenkins should have it tested and its clean to be merged be now |
@@ -27,6 +27,8 @@ import org.apache.spark.sql.{DataFrame, Row, SaveMode, SQLContext} | |||
import org.apache.spark.sql.sources._ | |||
import org.apache.spark.sql.types.StructType | |||
|
|||
import scala.collection.JavaConverters._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala import should be grouped together with line 22.
https://github.com/apache/spark/pull/8785/files#diff-5f0d0643fcfad315df0fdd7cae52dfaeR22
Finally someone replied :) |
Can one of the admins verify this patch? |
Please wait for me to address the conflicts, will do this after 2.0.0 preview main component become stable enough |
So 2.0.0-preview is already out and we are in RC2 so I wouldn't expect any big changes happening right now if you want to take the time to update the PR :) |
Absolutely lady, & welcome to Toronto. On 2016-07-06 04:56 PM, Holden Karau wrote:
|
All conflict fixed with minimal changes to original patch that has been peer-reviewed in Jan 2016. Request for merging. WARNING: DataFrameWriter Line 402 (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L402) can use JDBCRelation.getEffectiveProperties Line106 but I didn't change it to minimize diff. Please advise if it has to be corrected. |
Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties add one more unit test fix JDBCRelation & DataFrameWriter to pass all tests revise scala style put driver replacement code into a shared function fix styling upgrade to master and resolve all related issues remove the useless second deep copy of properties rename test names to be more explicit minor refactoring based on Sean's suggestion move JavaConverters import to under object remove the redundant toSeq and pull up lines in brackets improve styling in UnserializableDriverHelper and JDBCRelation remove whitespace in JDBCRelation line 42 add back type qualifiers of parameter of getEffectiveProperties into JDBCRelation to allow mutable Map being used. fix a unit test error: DriverManager.getDrivers.asScala returns an iterator that can only be iterated once, this commit cast it into a list to be reusable reformat import & styling fix a API invocation errors remove several getEffectiveProperties invocations as deep copies are already implemented in some functions. change test name to start with "SPARK-10625"
@tribbloid is this a problem that needs to be fixed? |
Hi Reynold,
Yes, by 1.6.2 it made some of the JDBC drivers (notably the one for SAP
HANA) to malfunction.
The fix is easy, though I haven't test if its already fixed in 2.0+.
Yours Peng
…On 2016-12-07 12:04 AM, Reynold Xin wrote:
@tribbloid <https://github.com/tribbloid> is this a problem that needs
to be fixed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8785 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/ADSDBd9p7cLc8MmfVv9-6WBhcwHpsz-2ks5rFj5cgaJpZM4F-xmq>.
|
This needs a rebase and there are still outstanding review comments (minor ones) |
ping @tribbloid. Are you able to proceed the review comments? If not, it'd be better closed for now. |
@HyukjinKwon yeah, just need to rebase on 2.2-SNAPSHOT+ |
I am not supposed to decide what to merge but I left the command as I just found this seems not active to the review comments and I assumed that this PR is currently abandoned which the author happened to be not able to proceed further for now. I'd rebase/address the review comments and keep pinging the related guys here. |
Closes apache#16819 Closes apache#13467 Closes apache#16083 Closes apache#17135 Closes apache#8785 Closes apache#16278 Closes apache#16997 Closes apache#17073 Closes apache#17220
Connection properties are now deep copied before they are used by JDBC Drivers, this solvs all problems in unit tests