Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cluster mode in PySpark #2197

Merged
merged 3 commits into from
Aug 9, 2017
Merged

Conversation

interskh
Copy link
Contributor

@interskh interskh commented Aug 2, 2017

Description

Support cluster mode in PySpark

Motivation and Context

We want to use cluster mode for pyspark like spark tasks.

Have you tested this? If so, how?

We run it on production for a couple of months.

Tested w/ Spark 2.1
@mention-bot
Copy link

@interskh, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jthi3rry, @ehdr and @ntim to be potential reviewers.

@@ -237,8 +237,6 @@ class PySparkTask(SparkSubmitTask):

# Path to the pyspark program passed to spark-submit
app = os.path.join(os.path.dirname(__file__), 'pyspark_runner.py')
# Python only supports the client deploy mode, force it
deploy_mode = "client"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an intentional deletion? Why not just allow overwrite of deploy_mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deploy_mode = "client" overwrites deploy_mode in SparkSubmitTask. It was here to force deploy mode to be client since previously cluster deploy mode wasn't supported. Now that we do support it, there is no need to pin it to client only.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use this module, but the edits seem to be logical.

If you wouldn't mind tagging some other PySpark contributors for review, that'd be great!

Thanks

@interskh
Copy link
Contributor Author

interskh commented Aug 8, 2017

@jthi3rry @ivannotes @ntim How do you like this PR? I tagged you here because you have contributed to pyspark class :)

@interskh
Copy link
Contributor Author

interskh commented Aug 8, 2017

@dlstadther @Tarrasch I remember there used to be some bot that automatically tag ppl for reviews. What happened to it?

@ntim
Copy link
Contributor

ntim commented Aug 9, 2017

👍 Looking forward ti try it!

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interskh I don't know what happened to mention-bot. :)

@jthi3rry
Copy link
Contributor

jthi3rry commented Aug 9, 2017

LGTM!

@Tarrasch Tarrasch merged commit 95b7da2 into spotify:master Aug 9, 2017
@Tarrasch
Copy link
Contributor

Tarrasch commented Aug 9, 2017

Thanks @interskh!

@interskh interskh deleted the kyle_pyspark_cluster branch September 25, 2017 21:22
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants