-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIVY-194. Share SparkContext across languages #178
Conversation
@vanzin Please help review the approach, I will add more test later. |
Current coverage is 69.83% (diff: 57.97%)@@ master #178 diff @@
==========================================
Files 91 81 -10
Lines 4697 3915 -782
Methods 0 0
Messages 0 0
Branches 811 655 -156
==========================================
- Hits 3323 2734 -589
+ Misses 899 805 -94
+ Partials 475 376 -99
|
Besides using different languages with the same SparkContext, we want to support the use case where multiple interactive sessions share the same SparkContext too. Some of our customers have a data scientists team working on the same set of data. Data scientists want to share the cached RDDs but yet having their own interactive sessions. Most of them want to use Python. I think your approach doesn't support this use case because there could at most be one interpreter per language. One alternative is, instead of creating a new SharedInterpreter, we can make interpreter children of interactive sessions. Each session has just 1 SparkContext and can have multiple interpreters. The SparkContext is shared among all interpreters in that session. Interpreters in a session can use the same or different languages. The REST interface will look like this:
Then, to create an interpreter:
To post a statement:
I think this design is more flexibility and supports more potential use cases. What do you think? |
Thanks @tc0312 for the quick feedback. Your use case is very interesting. But I feel the scope is little big and seems not easy to implement. I would suggest to put it in another ticket. Here's my several concerns.
|
Can u close and reopen the PR to run the test again? |
This indicates my test's flaky. I will stress test it tomorrow. |
The previous test failure is caused by a bad timeout value. I'm going to fix it in LIVY-186. |
39a5162
to
2d6e026
Compare
@tc0312 @linchan-ms Do you have any progress on this ? Like design doc or something else. |
We talked to @felixcheung recently and he told us depends on configuration, Zeppelin supports multiple interpreters per language too. |
That's correct. Zeppelin has one kind of mode named scoped which support multiple interpreter per language. But the SparkContext sharing across languages is supported no matter what configuration it is. |
If a customer wants 2 PySpark interpreters with different virtualenv or Python versions for visualization, 1 Scala interpreter for computation, this PR doesn't seem to support it. |
I think what @tc0312 mentioned about multiple interpreter per spark context could cover the scenario here. Here the shared session is just one specific use case of multiple interpreter per sc (one python/scala/R interpreter per sc). For the better code design and evolving, I would suggest to have a skeleton of multiple interpreter per sc first, then we could add shared session as a special case (if this feature is not urgent). |
One thing our customers want in Zeppelin and Jupyter is sharing SparkContext but not variables. |
@tc0312 and @zjffdu any status on this initiative. As we keep getting asked about context sharing between the different languages from Zeppelin specifically sparkR to scala for dataframes and such. |
Thanks for your interests on this feature @gss2002, we are working on a more sophisticated design. |
This PR is to share the same SparkContext across languages (scala/python/R). I introduce another new kind
shared
interpreter which support scala/python/R. And they share the same SparkContext.Main changes:
SharedInterpreter
which contains a map of other kinds of interpreter. The input code format of SharedInterpreter should be%kind code
, e.g.