Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIVY-19. Add Spark SQL support #148

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

meisam
Copy link
Contributor

@meisam meisam commented Jun 10, 2016

LIVY-19. Add Spark SQL support

This leaves the SparkInterpreter untouched and confines most of the changes to the livy-repl component.
The SparkSqlInterpreter is based on SparkInterpreter and needs to be polished.
Your feedback is really appreciated.

Task-Url: https://issues.cloudera.org/browse/LIVY-19

@ksakellis
Copy link

Please update this pr to point to the livy bug: LIVY-19

Implementing a Livy SparkSql Intrepreter.

This leaves the SparkInterpreter untouched and makes confines most of
the changes to the livy-repl component.

Task-Url: https://issues.cloudera.org/browse/LIVY-19
@meisam
Copy link
Contributor Author

meisam commented Jun 10, 2016

I updated the pull request description and reworded the commits to link to LIVY-19.

@codecov-io
Copy link

codecov-io commented Jun 10, 2016

Codecov Report

Merging #148 into master will decrease coverage by -10.71%.

@@             Coverage Diff             @@
##           master     #148       +/-   ##
===========================================
- Coverage   71.53%   60.83%   -10.71%     
===========================================
  Files          91       73       -18     
  Lines        4697     3965      -732     
  Branches      811      651      -160     
===========================================
- Hits         3360     2412      -948     
- Misses        861     1243      +382     
+ Partials      476      310      -166
Impacted Files Coverage Δ
...c/main/scala/com/cloudera/livy/sessions/Kind.scala 0% <ø> (-50%)
...a/com/cloudera/livy/repl/SparkSqlInterpreter.scala 0% <ø> (ø)
...main/scala/com/cloudera/livy/repl/ReplDriver.scala 66.66% <ø> (+34.16%)
api/src/main/java/com/cloudera/livy/JobHandle.java 0% <ø> (-100%)
core/src/main/scala/com/cloudera/livy/Utils.scala 0% <ø> (-93.75%)
...ore/src/main/scala/com/cloudera/livy/Logging.scala 0% <ø> (-81.82%)
...ain/scala/com/cloudera/livy/server/WebServer.scala 0% <ø> (-61.23%)
...cala/com/cloudera/livy/sessions/SessionState.scala 0% <ø> (-44.45%)
...ain/java/com/cloudera/livy/rsc/FutureListener.java 33.33% <ø> (-33.34%)
... and 74 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69ac11e...3ee5815. Read the comment docs.

@meisam meisam changed the title DTTAHOE-77: Create SQL Kind Session on Livy Job Server LIVY-19. Add Spark SQL support Jun 10, 2016
override def close(): Unit = synchronized {
if (hiveContext != null) {
// clean up and close hive context here
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe set hiveContext as null

@zjffdu
Copy link
Contributor

zjffdu commented Jun 13, 2016

Another comment is that maybe we can reuse the SparkContext in SparkInterpreter. We can just add sql support in rest api. e.g.

curl   http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"sql":"select * from table_1"}'

So that for tools like hue and zeppelin, different paragraph can share data easily as they share the same SparkContext.

@vanzin
Copy link
Contributor

vanzin commented Jun 13, 2016

Hi @meisam,

Before we can even look at this patch, we need a signed ICLA and, if it applies, a signed CCLA from your employer. Please check the wiki: https://github.com/cloudera/livy/wiki/Contributing-to-Livy

@vanzin
Copy link
Contributor

vanzin commented Jun 13, 2016

maybe we can reuse the SparkContext in SparkInterpreter

Without having looked at the code, I like that idea.

@meisam
Copy link
Contributor Author

meisam commented Jun 13, 2016

@vanzin
Where can I find the link to the CCLA? I'll work with my manager to have it signed and sent.

@vanzin
Copy link
Contributor

vanzin commented Jun 13, 2016

It's at the link I posted above.

@meisam
Copy link
Contributor Author

meisam commented Jun 13, 2016

@zjffdu I was thinking more along these lines

curl http://localhost:8998/sessions/0/sql-statements -X POST -H 'Content-Type: application/json' -d '{"code":"select * from table_1"}'

This keeps com.cloudera.livy.ExecuteRequest untouched.

@zjffdu
Copy link
Contributor

zjffdu commented Jun 14, 2016

@meisam That make sense.

@ksakellis
Copy link

@meisam
Copy link
Contributor Author

meisam commented Oct 10, 2016

@ksakellis I mailed the ICLA and CCLA to [email protected] last week.

@alex-the-man
Copy link
Contributor

alex-the-man commented Nov 2, 2016

Livy interpreters support 2 magics: %json and %table.
I think we can add another magic: %sql to support this use without adding adding a new REST API.

To run a SQL statement, user can just POST sessions/<id>/statements with {"code":"%sql <sql>"}

@zjffdu
Copy link
Contributor

zjffdu commented Nov 2, 2016

Before that, I think we should implement sparkcontext sharing across language #178 , because it is very often user want to access table that is created in spark/pyspark/sparkr interpreter.

@alex-the-man
Copy link
Contributor

Sorry let me clarify myself. I'm purposing to add %sql magic to all interpreters. Instead of creating a new interpreter. It shouldn't depend on #178.

@meisam
Copy link
Contributor Author

meisam commented Nov 2, 2016

@tc0312 How would a %sql magic work with SQL code that spans over multiple lines? Or queries that contain comments in them?

@alex-the-man
Copy link
Contributor

alex-the-man commented Nov 2, 2016

It's up to us to define how %sql magic works. We can define it in a way that it works with multiple lines and comments.

@meisam
Copy link
Contributor Author

meisam commented Nov 2, 2016

@tc0312 I guess I should clarify myself. My question is, which approach is easier to implement? Having a %sql magic? Or having a separate SQL interpreter?

@meisam
Copy link
Contributor Author

meisam commented Nov 2, 2016

@tc0312 Actually there's a third approach that @zjffdu suggested: '{"sql":"select * from table_1"}'

@alex-the-man
Copy link
Contributor

I think they are both easy to implement. I would prefer a magic approach because Livy already has magics and cloudera/hue is using it.

@sven0726
Copy link

What was the final result of this issue?

@zjffdu
Copy link
Contributor

zjffdu commented May 31, 2017

It needs more careful design in https://issues.cloudera.org/browse/LIVY-194
We'd like to share sparkcontext across scala/python/R/sql

@pkasinathan
Copy link

pkasinathan commented Jun 1, 2017

Hi @zjffdu,

Enabling shared spark context across scala/python/R is next phase and it may take sometime.

But, when we support Scala, Python and R interpreters already, can we also add SQL interpreter?

If we have direct SQL interpreter enabled, then it will be very easy for users to submit SQL statements directly to interpreter instead of wrapping it up with HiveContext (<2.0) or Spark Session (>2.0) or using SQL magic everytime.

Please let me know your thoughts.

Prabhu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants