Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19862] 'tungsten-sort' should be deleted in SparkEnv.scala #17220

Closed
wants to merge 1 commit into from
Closed

[SPARK-19862] 'tungsten-sort' should be deleted in SparkEnv.scala #17220

wants to merge 1 commit into from

Conversation

guoxiaolongzte
Copy link

JIRA Issue: https://github.com/guoxiaolongzte/spark/tree/SPARK-19862

In SparkEnv.scala,remove tungsten-sort.Because it is not represent 'org.apache.spark.shuffle.unsafe.UnsafeShuffleManager'.So it should by delete.Only in this way, it will not cause user ambiguity.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mridulm
Copy link
Contributor

mridulm commented Mar 9, 2017

If this was an exposed parameter, we cannot remove it - irrespective of the duplication.

@guoxiaolongzte
Copy link
Author

In spark 1.4.1, you delete the parameter of 'hash'.I think it should be deleted.In the spark website indicated in the documents, should not keep this logic in the code.

1.4.1****
val shortShuffleMgrNames = Map(
"hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
"sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager",
"tungsten-sort" -> "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager")
1.4.1****

@jerryshao
Copy link
Contributor

Can you please fix the title like what other PR did.

@guoxiaolongzte guoxiaolongzte changed the title remove tungsten-sort.Because it is not represent 'org.apache.spark.sh… [SPARK-19862] 'tungsten-sort' should be deleted in SparkEnv.scala Mar 9, 2017
@guoxiaolongzte
Copy link
Author

Ok, I have modified the title.

@rxin
Copy link
Contributor

rxin commented Mar 9, 2017

Is this change even correct? This is here for backward compatibility.

@guoxiaolongzte
Copy link
Author

I think the compatibility, the resulting shuffle manager is not I want.Only the parameter values' sort real SortShuffleManager said.

@rxin
Copy link
Contributor

rxin commented Mar 9, 2017

I don't think you understand this. This value is here so if at some point some user picked tungsten-sort, we won't break it. In recent versions of Spark the default sort manager accomplishes the thing as the old tungsten sort.

@rxin
Copy link
Contributor

rxin commented Mar 9, 2017

If anything, we should just update the file to add a line of comment to make sure people don't delete this in the future.

@guoxiaolongzte
Copy link
Author

I think I should delete, update in the document at the same time, so that to ensure the uniqueness of function.

@jerryshao
Copy link
Contributor

jerryshao commented Mar 9, 2017

@guoxiaolongzte , I think here though "tungsten-sort" is the same as "sort" now, for the configuration backward compatibility we still need to keep it. If somehow user still configured with "tungsten-sort", with your change the application will be failed. I think that's what @rxin mentioned about.

Looking at some other configurations, we typically keep backward compatibility unless major release (Spark 1.6 to Spark 2.0).

@guoxiaolongzte
Copy link
Author

thanks.I understand this.

@guoxiaolongzte
Copy link
Author

why HashShuffleManager have been deleted.

@jerryshao
Copy link
Contributor

Hash-based shuffle has some problems with large number of partitions, and part of hash-based shuffle's feature has already been incorporated into sort-based shuffle. Spark's sort-based shuffle is not pure sort-based like MR, it is actually a mixed pattern depends on the partition numbers.

srowen added a commit to srowen/spark that referenced this pull request Mar 22, 2017
@srowen srowen mentioned this pull request Mar 22, 2017
@asfgit asfgit closed this in b70c03a Mar 23, 2017
@guoxiaolongzte guoxiaolongzte deleted the SPARK-19862 branch March 31, 2017 06:43
@guoxiaolongzte
Copy link
Author

@jerryshao@rxin@srowen
In spark2.1.0,"tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName has been deleted,but you didn't agree with my issue SPARK-19862.why?

@jerryshao
Copy link
Contributor

jerryshao commented Mar 31, 2017

What's the meaning of "has been deleted in Spark 2.1.0"? I think the reason mention above is quite clear.

@guoxiaolongzte
Copy link
Author

spark2.0.2
val shortShuffleMgrNames = Map(
"sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName,
"tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName)

spark2.1.0
val shortShuffleMgrNames = Map(
"sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName)

The above is based on the analysis of release the original.

@jerryshao
Copy link
Contributor

Well, I still saw "tungsten-sort" in branch 2.1 and master (https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/SparkEnv.scala#L320).

Can you tell which code did you check?

@guoxiaolongzte
Copy link
Author

guoxiaolongzte commented Mar 31, 2017

Sorry,i watch the wrong,i download the master of the branch 2.1.My issue also mentioned to remove the code, and should not be Resolution: Won't Fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants