Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

Closed

Conversation

JoshRosen
Copy link
Contributor

This reverts #1899 and #2163, two patches that modified spark-ec2 so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group.

Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of spark-ec2 launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the spark-ec2 script will be unable to clean them up.

Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch.

@JoshRosen
Copy link
Contributor Author

Both patches reverted cleanly using git revert, but I'd still appreciate manual review to make sure I haven't broken anything. Going to test this shortly.

@SparkQA
Copy link

SparkQA commented Aug 31, 2014

QA tests have started for PR 2225. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19546/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 31, 2014

QA results for PR 2225:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19546/consoleFull

@JoshRosen
Copy link
Contributor Author

Since the original two commits, there were only two intervening commits to spark_ec2.py, c71b5c6 and 0c94a5b, both of which have the changes correctly preserved here.

(see https://github.com/JoshRosen/spark/commits/revert-ec2-cluster-naming/ec2/spark_ec2.py)

@shivaram
Copy link
Contributor

shivaram commented Sep 1, 2014

LGTM. Hmm - This is unfortunate and something that I feared given the flaky experiences I have had with tags before. I checked the intervening commits too and the reverts look good to me.

Could you also manually test by launching a EC2 cluster just to be doubly sure ?

@JoshRosen
Copy link
Contributor Author

I launched a cluster earlier using this version of the script and shut it down using the v1.0.2 one.

@nchammas
Copy link
Contributor

nchammas commented Sep 1, 2014

FYI: It looks like branch-1.1 has an out-of-date version of run-tests-jenkins. Might want to update that.

@pwendell
Copy link
Contributor

pwendell commented Sep 2, 2014

okay cool - let's merge this then (it isn't tested by jenkins anyways) although it would be nice to update that script in branch-1.1.

@nchammas
Copy link
Contributor

nchammas commented Sep 2, 2014

it would be nice to update that script in branch-1.1

@pwendell I've done this in #2237.

@JoshRosen
Copy link
Contributor Author

I've merged this into branch-1.1.

asfgit pushed a commit that referenced this pull request Sep 2, 2014
This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups.  The original motivation for this patch was to allow multiple clusters to run in the same security group.

Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances.  The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up.

Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters.  For the 1.1.0 release, though, I propose that we just revert this patch.

Author: Josh Rosen <[email protected]>

Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits:

0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group"
c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
@JoshRosen JoshRosen closed this Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants