LIVY-194. Share SparkContext across languages #178

zjffdu · 2016-08-11T08:04:19Z

This PR is to share the same SparkContext across languages (scala/python/R). I introduce another new kind shared interpreter which support scala/python/R. And they share the same SparkContext.

Main changes:

new kind interpreter SharedInterpreter which contains a map of other kinds of interpreter. The input code format of SharedInterpreter should be %kind code, e.g.

%spark sc.parallelize(1 to 10).sum()

SparkContext/SQLContext will be created from SparkFactory so that we can share the same SparkContext/SQLContext. Scala/Python/R will all get SparkContext/SQLContext from SparkFactory.

zjffdu · 2016-08-11T08:04:51Z

@vanzin Please help review the approach, I will add more test later.

codecov-io · 2016-08-11T08:28:42Z

Current coverage is 69.83% (diff: 57.97%)

Merging #178 into master will decrease coverage by 0.91%

@@             master       #178   diff @@
==========================================
  Files            91         81    -10   
  Lines          4697       3915   -782   
  Methods           0          0          
  Messages          0          0          
  Branches        811        655   -156   
==========================================
- Hits           3323       2734   -589   
+ Misses          899        805    -94   
+ Partials        475        376    -99

Powered by Codecov. Last update 69ac11e...2e4dd74

alex-the-man · 2016-08-11T18:45:25Z

Besides using different languages with the same SparkContext, we want to support the use case where multiple interactive sessions share the same SparkContext too.

Some of our customers have a data scientists team working on the same set of data. Data scientists want to share the cached RDDs but yet having their own interactive sessions. Most of them want to use Python. I think your approach doesn't support this use case because there could at most be one interpreter per language.

One alternative is, instead of creating a new SharedInterpreter, we can make interpreter children of interactive sessions. Each session has just 1 SparkContext and can have multiple interpreters. The SparkContext is shared among all interpreters in that session. Interpreters in a session can use the same or different languages.

The REST interface will look like this:
To create a session (for a SparkContext):

POST /sessions

Then, to create an interpreter:

POST /sessions/0/interpreter
{"kind":"pyspark"}

To post a statement:

POST /sessions/0/interpreter/0/statements
{"code":"1+1"}

I think this design is more flexibility and supports more potential use cases. What do you think?

zjffdu · 2016-08-12T00:15:21Z

Thanks @tc0312 for the quick feedback. Your use case is very interesting. But I feel the scope is little big and seems not easy to implement. I would suggest to put it in another ticket. Here's my several concerns.

It adds new rest api for user. We need to be careful about adding new rest api, as once we add we need to maintain it in future for backward compatibility.
How do create new interpreter in an existing session ? Say we have created an such session (yarn app is created), now we want to add new PySparkInterpreter. How do we ask the driver to launch another python process ? Seems we need to add new protocol between RSC client & server for that. And even we implement that, I just concern about the scalability of running multiple SparkIMain/Python Process/R Process in one JVM.

alex-the-man · 2016-08-17T05:23:33Z

Can u close and reopen the PR to run the test again?

alex-the-man · 2016-08-17T06:07:46Z

This indicates my test's flaky. I will stress test it tomorrow.

alex-the-man · 2016-08-17T20:17:30Z

The previous test failure is caused by a bad timeout value. I'm going to fix it in LIVY-186.

zjffdu · 2016-12-13T02:35:17Z

@tc0312 @linchan-ms Do you have any progress on this ? Like design doc or something else.
This PR would implement the SparkContext sharing between languages. If your idear of multiple interpreters per language would not break this, then I think I can continue this work as part of the whole implementation. What do you think ?

alex-the-man · 2016-12-13T02:43:40Z

We talked to @felixcheung recently and he told us depends on configuration, Zeppelin supports multiple interpreters per language too.

zjffdu · 2016-12-13T02:57:37Z

That's correct. Zeppelin has one kind of mode named scoped which support multiple interpreter per language. But the SparkContext sharing across languages is supported no matter what configuration it is.

alex-the-man · 2016-12-14T00:05:56Z

If a customer wants 2 PySpark interpreters with different virtualenv or Python versions for visualization, 1 Scala interpreter for computation, this PR doesn't seem to support it.

jerryshao · 2016-12-15T08:49:17Z

I think what @tc0312 mentioned about multiple interpreter per spark context could cover the scenario here. Here the shared session is just one specific use case of multiple interpreter per sc (one python/scala/R interpreter per sc).

For the better code design and evolving, I would suggest to have a skeleton of multiple interpreter per sc first, then we could add shared session as a special case (if this feature is not urgent).

alex-the-man · 2016-12-16T05:26:11Z

One thing our customers want in Zeppelin and Jupyter is sharing SparkContext but not variables.
I think that's scoped mode in Zeppelin. This PR doesn't support that use case.

gss2002 · 2017-02-08T22:41:40Z

@tc0312 and @zjffdu any status on this initiative. As we keep getting asked about context sharing between the different languages from Zeppelin specifically sparkR to scala for dataframes and such.

zjffdu · 2017-02-10T08:12:33Z

Thanks for your interests on this feature @gss2002, we are working on a more sophisticated design.

LIVY-194. Share SparkContext across languages

2e4dd74

zjffdu closed this Aug 17, 2016

zjffdu reopened this Aug 17, 2016

alex-the-man force-pushed the master branch 2 times, most recently from 39a5162 to 2d6e026 Compare September 7, 2016 14:23

zjffdu mentioned this pull request Nov 2, 2016

LIVY-19. Add Spark SQL support #148

Open

zjffdu closed this Aug 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIVY-194. Share SparkContext across languages #178

LIVY-194. Share SparkContext across languages #178

zjffdu commented Aug 11, 2016 •

edited

Loading

zjffdu commented Aug 11, 2016

codecov-io commented Aug 11, 2016 •

edited

Loading

alex-the-man commented Aug 11, 2016 •

edited

Loading

zjffdu commented Aug 12, 2016

alex-the-man commented Aug 17, 2016

alex-the-man commented Aug 17, 2016

alex-the-man commented Aug 17, 2016

zjffdu commented Dec 13, 2016

alex-the-man commented Dec 13, 2016

zjffdu commented Dec 13, 2016

alex-the-man commented Dec 14, 2016 •

edited

Loading

jerryshao commented Dec 15, 2016

alex-the-man commented Dec 16, 2016 •

edited

Loading

gss2002 commented Feb 8, 2017

zjffdu commented Feb 10, 2017

LIVY-194. Share SparkContext across languages #178

LIVY-194. Share SparkContext across languages #178

Conversation

zjffdu commented Aug 11, 2016 • edited Loading

zjffdu commented Aug 11, 2016

codecov-io commented Aug 11, 2016 • edited Loading

Current coverage is 69.83% (diff: 57.97%)

alex-the-man commented Aug 11, 2016 • edited Loading

zjffdu commented Aug 12, 2016

alex-the-man commented Aug 17, 2016

alex-the-man commented Aug 17, 2016

alex-the-man commented Aug 17, 2016

zjffdu commented Dec 13, 2016

alex-the-man commented Dec 13, 2016

zjffdu commented Dec 13, 2016

alex-the-man commented Dec 14, 2016 • edited Loading

jerryshao commented Dec 15, 2016

alex-the-man commented Dec 16, 2016 • edited Loading

gss2002 commented Feb 8, 2017

zjffdu commented Feb 10, 2017

zjffdu commented Aug 11, 2016 •

edited

Loading

codecov-io commented Aug 11, 2016 •

edited

Loading

alex-the-man commented Aug 11, 2016 •

edited

Loading

alex-the-man commented Dec 14, 2016 •

edited

Loading

alex-the-man commented Dec 16, 2016 •

edited

Loading