Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6479][Block Manager]Create off-heap block storage API #5430

Closed
wants to merge 7 commits into from

Conversation

zhzhan
Copy link
Contributor

@zhzhan zhzhan commented Apr 8, 2015

This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.

@zhzhan zhzhan changed the title [SPARK-6479][SPARK-CORE]Create off-heap block storage API [SPARK-6479][Block Manager]Create off-heap block storage API Apr 8, 2015
@rxin
Copy link
Contributor

rxin commented Apr 9, 2015

To help me understand this patch, can you also put the Tachyon implementation into this?

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29903 has finished for PR 5430 at commit b754d2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait OffHeapBlockManager
  • This patch does not change any dependencies.

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 9, 2015

@rxin I have attached the patch with Tachyon migration code to the JIRA.

https://issues.apache.org/jira/secure/attachment/12724088/spark-6479-tachyon.patch

The patch is incomplete on purpose, because most of the diff (not included) is just changing the term from tachyon to offheap. If you think it is better to do tachyon migration with this JIRA, please let me know and I will do it in one shot.

By the way, there is minor change in OffHeapStore and OffHeapBlockManager, which is inconsistent with this PR, please ignore it.

val subDirsPerDir = 64
def create(blockManager: BlockManager,
executorId: String): Option[OffHeapBlockManager] = {
val sNames = blockManager.conf.getOption("spark.offHeapStore.blockManager")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think default value is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User may not want to use offheap, and in this case the OffHeapBlockManager will be None.

@rxin
Copy link
Contributor

rxin commented Apr 9, 2015

Thanks - let's put the Tachyon patch with this. In this case, I think it will be easier to review and understand the API semantics.

override def getValues(blockId: BlockId): Option[Iterator[Any]] = {
getBytes(blockId).map(buffer => blockManager.dataDeserialize(blockId, buffer))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the extra blank line here when you update

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 10, 2015

@rxin This is the complete change for offheap api, and tachyon migration code. There is no any logical change in tachyong (just move code around), except changing one system.ext to throw exceptions in TachyonBlockManager initialization part.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #29997 has finished for PR 5430 at commit e4814df.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30017 has finished for PR 5430 at commit bf058bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

// Add a timestamp as the suffix here to make it more safe
val tachyonFolderName = "spark-" + randomUUID.toString()
conf.set("spark.tachyonStore.folderName", tachyonFolderName)
val offHeapFolderName = "spark-" + randomUUID.toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API breaking change. We need to keep the old one around (as an alias), and deprecate it.

@rxin
Copy link
Contributor

rxin commented Apr 10, 2015

@zhzhan I also left a high level comment on JIRA - it'd be better to call this external block store, rather than off-heap store.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30203 has finished for PR 5430 at commit 0a49021.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 20, 2015

@rxin Could you help to review the patch and let me know if you have any concern?

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30609 has finished for PR 5430 at commit d1c9921.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30614 has finished for PR 5430 at commit 3ea0689.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@rxin
Copy link
Contributor

rxin commented Apr 22, 2015

Sorry for the delay - I will look at this again today.

}

def blockStatusFromJson(json: JValue): BlockStatus = {
val storageLevel = storageLevelFromJson(json \ "Storage Level")
val memorySize = (json \ "Memory Size").extract[Long]
val diskSize = (json \ "Disk Size").extract[Long]
val tachyonSize = (json \ "Tachyon Size").extract[Long]
BlockStatus(storageLevel, memorySize, diskSize, tachyonSize)
val extBlkStoreSize = (json \ "ExtBlkStore Size").extract[Long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to handle backwards compatibility - i.e. logs from older versions of Spark where it says Tachyon. If you look there are other backwards compatibility tests relating to this protocol too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "Tachyon" version is present, I'd look for that and then just convert it.

* desc for the implementation.
*
*/
def desc(): String = {"External Block Store"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is desc going to be used? maybe we should just override toString?

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 25, 2015

Jenkins, retest this please.

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 25, 2015

Jenkins, test it please.

@pwendell
Copy link
Contributor

Jenkins, test this please.

1 similar comment
@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 27, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31175 has finished for PR 5430 at commit a80d50c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 29, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31197 has finished for PR 5430 at commit a80d50c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zhzhan
Copy link
Contributor Author

zhzhan commented Apr 29, 2015

@rxin @pwendell Can you help review the new patch, and let me know if you have any concern?

@pwendell
Copy link
Contributor

pwendell commented May 1, 2015

I took as pass and this LGTM. However it needs to be brought up to date.

@zhzhan
Copy link
Contributor Author

zhzhan commented May 1, 2015

I think Spark-5213 fail the mina test

#4015

[info] spark-sql: found 1 potential binary incompatibilities (filtered 129)
[error] * method sqlParser()org.apache.spark.sql.SparkSQLParser in class org.apache.spark.sql.SQLContext does not have a correspondent in new version
[error] filter with: ProblemFilters.excludeMissingMethodProblem

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31502 has finished for PR 5430 at commit 60acd84.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zhzhan
Copy link
Contributor Author

zhzhan commented May 1, 2015

@marmbrus master failure with mima test. Probably caused by #4015

@pwendell
Copy link
Contributor

pwendell commented May 1, 2015

Jenkins, retest this please. Thanks @zhzhan I reverted the patch.

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31506 has finished for PR 5430 at commit 60acd84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@asfgit asfgit closed this in 36a7a68 May 1, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.

Author: Zhan Zhang <[email protected]>

Closes apache#5430 from zhzhan/SPARK-6479 and squashes the following commits:

60acd84 [Zhan Zhang] minor change to kickoff the test
12f54c9 [Zhan Zhang] solve merge conflicts
a54132c [Zhan Zhang] solve review comments
ffb8e00 [Zhan Zhang] rebase to sparkcontext change
6e121e0 [Zhan Zhang] resolve review comments and restructure blockmanasger code
a7aed6c [Zhan Zhang] add Tachyon migration code
186de31 [Zhan Zhang] initial commit for off-heap block storage api
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.

Author: Zhan Zhang <[email protected]>

Closes apache#5430 from zhzhan/SPARK-6479 and squashes the following commits:

60acd84 [Zhan Zhang] minor change to kickoff the test
12f54c9 [Zhan Zhang] solve merge conflicts
a54132c [Zhan Zhang] solve review comments
ffb8e00 [Zhan Zhang] rebase to sparkcontext change
6e121e0 [Zhan Zhang] resolve review comments and restructure blockmanasger code
a7aed6c [Zhan Zhang] add Tachyon migration code
186de31 [Zhan Zhang] initial commit for off-heap block storage api
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.

Author: Zhan Zhang <[email protected]>

Closes apache#5430 from zhzhan/SPARK-6479 and squashes the following commits:

60acd84 [Zhan Zhang] minor change to kickoff the test
12f54c9 [Zhan Zhang] solve merge conflicts
a54132c [Zhan Zhang] solve review comments
ffb8e00 [Zhan Zhang] rebase to sparkcontext change
6e121e0 [Zhan Zhang] resolve review comments and restructure blockmanasger code
a7aed6c [Zhan Zhang] add Tachyon migration code
186de31 [Zhan Zhang] initial commit for off-heap block storage api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants