Shutdown hook called before final status was reported. #2666

david-wb · 2017-05-05T06:34:54Z

Hi,
I am experimenting with submitting a PrintReadsSpark job to a yarn spark cluster in AWS. I run the job with the following command.

spark-submit --deploy-mode cluster --class org.broadinstitute.hellbender.Main --deploy-mode cluster --master yarn gatk-package-4.alpha.2-248-gcd449bf-SNAPSHOT-spark.jar PrintReadsSpark -I hdfs://chr1.bam -O hdfs://output.bam

I can see from the output files that the job finished successfully, however the cluster tells me that it failed. It shows the following error message:

17/05/05 06:03:53 INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)

I believe this may be due to the System.exit(0) statement at line 144 in hellbender.Main, though I am not sure.

Here is a more complete snippet from the stderr log.

17/05/05 06:03:50 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 06:03:52 WARN DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1249)
	at java.lang.Thread.join(Thread.java:1323)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/05/05 06:03:52 INFO FileOutputCommitter: Saved output of task 'attempt_20170505060336_0011_r_000001_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output.bam.parts/_temporary/0/task_20170505060336_0011_r_000001
17/05/05 06:03:52 INFO Executor: Finished task 1.0 in stage 2.0 (TID 3). 1921 bytes result sent to driver
17/05/05 06:03:52 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 10260 ms on localhost (executor driver) (1/4)
17/05/05 06:03:53 INFO FileOutputCommitter: Saved output of task 'attempt_20170505060336_0011_r_000000_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output.bam.parts/_temporary/0/task_20170505060336_0011_r_000000
17/05/05 06:03:53 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1921 bytes result sent to driver
17/05/05 06:03:53 INFO FileOutputCommitter: Saved output of task 'attempt_20170505060336_0011_r_000003_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output.bam.parts/_temporary/0/task_20170505060336_0011_r_000003
17/05/05 06:03:53 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 10599 ms on localhost (executor driver) (2/4)
17/05/05 06:03:53 INFO Executor: Finished task 3.0 in stage 2.0 (TID 5). 1921 bytes result sent to driver
17/05/05 06:03:53 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 5) in 10599 ms on localhost (executor driver) (3/4)
17/05/05 06:03:53 INFO FileOutputCommitter: Saved output of task 'attempt_20170505060336_0011_r_000002_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output.bam.parts/_temporary/0/task_20170505060336_0011_r_000002
17/05/05 06:03:53 INFO Executor: Finished task 2.0 in stage 2.0 (TID 4). 1921 bytes result sent to driver
17/05/05 06:03:53 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 4) in 10792 ms on localhost (executor driver) (4/4)
17/05/05 06:03:53 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
17/05/05 06:03:53 INFO DAGScheduler: ResultStage 2 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:202) finished in 10.796 s
17/05/05 06:03:53 INFO DAGScheduler: Job 1 finished: saveAsNewAPIHadoopFile at ReadsSparkSink.java:202, took 17.010114 s
17/05/05 06:03:53 INFO SparkUI: Stopped Spark web UI at http://172.30.0.122:35794
17/05/05 06:03:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/05 06:03:53 INFO MemoryStore: MemoryStore cleared
17/05/05 06:03:53 INFO BlockManager: BlockManager stopped
17/05/05 06:03:53 INFO BlockManagerMaster: BlockManagerMaster stopped
17/05/05 06:03:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/05 06:03:53 INFO SparkContext: Successfully stopped SparkContext
[May 5, 2017 6:03:53 AM UTC] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark done. Elapsed time: 0.40 minutes.
Runtime.totalMemory()=799080448
17/05/05 06:03:53 INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/05/05 06:03:53 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.)
17/05/05 06:03:53 INFO ApplicationMaster: Deleting staging directory hdfs://ip-172-30-0-86.ec2.internal:8020/user/hadoop/.sparkStaging/application_1493961816416_0002
17/05/05 06:03:53 INFO ShutdownHookManager: Shutdown hook called
17/05/05 06:03:53 INFO ShutdownHookManager: Deleting directory /mnt1/yarn/usercache/hadoop/appcache/application_1493961816416_0002/spark-23bc1c26-8133-475d-8418-1def58e976fd
17/05/05 06:03:53 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1493961816416_0002/spark-edb79d30-a7d3-4ed2-ad61-2dc8ea95c5b1

Thanks in helping me solve this issue!

The text was updated successfully, but these errors were encountered:

lbergelson · 2017-05-05T15:18:19Z

Hi @david-wb

I reformatted your comment slightly to make the stack trace more legible, I hope you don't mind.

I suspect your intuition about the System.exit(0) is entirely correct. I suspect we haven't noticed it because we typically run in yarn client mode and you're running in cluster mode.

Two questions:

How often does it happen? Can you regularly reproduce it?
Have you examined the output files to make sure they are correct and not truncated?

It looks like we'll probably have to add a check and wait for the spark context to properly shut down.

david-wb · 2017-05-05T17:23:54Z

Yes it it reproducible. It happens every time.
No, I have not examined the output files, but I have run the same app on my local machine with

spark-submit --master local[2] gatk-package-4.alpha.2-252-gf627ed4-SNAPSHOT-spark.jar PrintReadsSpark -I chr1.bam -O output.bam

And the output.bam has nearly the same size in bytes as the output from the yarn cluster, so it doesn't appear to be truncated. The output .bai files have identical sizes.

Here is the full stderr log.

Log Type: stderr
Log Upload Time: Fri May 05 17:04:00 +0000 2017
Log Length: 18471
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/37/__spark_libs__6987413740287883326.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/05 17:03:30 INFO SignalUtils: Registered signal handler for TERM
17/05/05 17:03:30 INFO SignalUtils: Registered signal handler for HUP
17/05/05 17:03:30 INFO SignalUtils: Registered signal handler for INT
17/05/05 17:03:30 INFO ApplicationMaster: Preparing Local resources
17/05/05 17:03:32 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1493961816416_0010_000002
17/05/05 17:03:32 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/05/05 17:03:32 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/05/05 17:03:32 INFO SecurityManager: Changing view acls groups to: 
17/05/05 17:03:32 INFO SecurityManager: Changing modify acls groups to: 
17/05/05 17:03:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/05/05 17:03:32 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/05 17:03:32 INFO ApplicationMaster: Waiting for spark context initialization...
[May 5, 2017 5:03:35 PM UTC] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark  --output hdfs:///output2.bam --input hdfs:///chr1.bam  --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --bamPartitionSize 0 --disableSequenceDictionaryValidation false --shardedOutput false --numReducers 0 --sparkMaster local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false
[May 5, 2017 5:03:35 PM UTC] Executing as yarn@ip-172-30-0-122 on Linux 4.4.35-33.55.amzn1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b13; Version: Version:4.alpha.2-252-gf627ed4-SNAPSHOT
17/05/05 17:03:35 INFO SparkContext: Running Spark version 2.1.0
17/05/05 17:03:35 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/05/05 17:03:35 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/05/05 17:03:35 INFO SecurityManager: Changing view acls groups to: 
17/05/05 17:03:35 INFO SecurityManager: Changing modify acls groups to: 
17/05/05 17:03:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/05/05 17:03:35 INFO Utils: Successfully started service 'sparkDriver' on port 42358.
17/05/05 17:03:35 INFO SparkEnv: Registering MapOutputTracker
17/05/05 17:03:35 INFO SparkEnv: Registering BlockManagerMaster
17/05/05 17:03:35 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/05/05 17:03:35 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/05/05 17:03:35 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/hadoop/appcache/application_1493961816416_0010/blockmgr-356a706f-2395-4ef6-985a-d3a7d7b01a8a
17/05/05 17:03:35 INFO DiskBlockManager: Created local directory at /mnt1/yarn/usercache/hadoop/appcache/application_1493961816416_0010/blockmgr-89521127-7e39-47e4-b57d-24ded3655d9e
17/05/05 17:03:35 INFO MemoryStore: MemoryStore started with capacity 414.4 MB
17/05/05 17:03:35 INFO SparkEnv: Registering OutputCommitCoordinator
17/05/05 17:03:36 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/05/05 17:03:36 INFO Utils: Successfully started service 'SparkUI' on port 46483.
17/05/05 17:03:36 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.30.0.122:46483
17/05/05 17:03:36 INFO Executor: Starting executor ID driver on host localhost
17/05/05 17:03:36 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34295.
17/05/05 17:03:36 INFO NettyBlockTransferService: Server created on 172.30.0.122:34295
17/05/05 17:03:36 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/05/05 17:03:36 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.30.0.122, 34295, None)
17/05/05 17:03:36 INFO BlockManagerMasterEndpoint: Registering block manager 172.30.0.122:34295 with 414.4 MB RAM, BlockManagerId(driver, 172.30.0.122, 34295, None)
17/05/05 17:03:36 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.30.0.122, 34295, None)
17/05/05 17:03:36 INFO BlockManager: external shuffle service port = 7337
17/05/05 17:03:36 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.30.0.122, 34295, None)
17/05/05 17:03:36 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/local-1494003816349
17/05/05 17:03:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 304.1 KB, free 414.2 MB)
17/05/05 17:03:38 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.0 KB, free 414.1 MB)
17/05/05 17:03:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.30.0.122:34295 (size: 26.0 KB, free: 414.4 MB)
17/05/05 17:03:38 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at ReadsSparkSource.java:109
17/05/05 17:03:38 INFO FileInputFormat: Total input paths to process : 1
17/05/05 17:03:38 INFO SparkContext: Starting job: sortByKey at ReadsSparkSink.java:244
17/05/05 17:03:38 INFO DAGScheduler: Got job 0 (sortByKey at ReadsSparkSink.java:244) with 1 output partitions
17/05/05 17:03:38 INFO DAGScheduler: Final stage: ResultStage 0 (sortByKey at ReadsSparkSink.java:244)
17/05/05 17:03:38 INFO DAGScheduler: Parents of final stage: List()
17/05/05 17:03:38 INFO DAGScheduler: Missing parents: List()
17/05/05 17:03:38 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at sortByKey at ReadsSparkSink.java:244), which has no missing parents
17/05/05 17:03:38 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 466.9 KB, free 413.7 MB)
17/05/05 17:03:39 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 132.8 KB, free 413.5 MB)
17/05/05 17:03:39 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.30.0.122:34295 (size: 132.8 KB, free: 414.3 MB)
17/05/05 17:03:39 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:997
17/05/05 17:03:39 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at sortByKey at ReadsSparkSink.java:244)
17/05/05 17:03:39 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/05/05 17:03:39 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6025 bytes)
17/05/05 17:03:39 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/05/05 17:03:39 INFO NewHadoopRDD: Input split: hdfs://ip-172-30-0-86.ec2.internal:8020/chr1.bam:2825191424-2410664558591
17/05/05 17:03:41 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 177741 bytes result sent to driver
17/05/05 17:03:41 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2508 ms on localhost (executor driver) (1/1)
17/05/05 17:03:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/05/05 17:03:41 INFO DAGScheduler: ResultStage 0 (sortByKey at ReadsSparkSink.java:244) finished in 2.530 s
17/05/05 17:03:41 INFO DAGScheduler: Job 0 finished: sortByKey at ReadsSparkSink.java:244, took 2.716663 s
17/05/05 17:03:41 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.8 MB, free 410.7 MB)
17/05/05 17:03:41 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 105.4 KB, free 410.6 MB)
17/05/05 17:03:41 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.30.0.122:34295 (size: 105.4 KB, free: 414.2 MB)
17/05/05 17:03:41 INFO SparkContext: Created broadcast 2 from broadcast at ReadsSparkSink.java:194
17/05/05 17:03:41 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
17/05/05 17:03:41 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 17:03:41 INFO SparkContext: Starting job: saveAsNewAPIHadoopFile at ReadsSparkSink.java:202
17/05/05 17:03:41 INFO DAGScheduler: Registering RDD 5 (mapToPair at ReadsSparkSink.java:236)
17/05/05 17:03:41 INFO DAGScheduler: Got job 1 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:202) with 4 output partitions
17/05/05 17:03:41 INFO DAGScheduler: Final stage: ResultStage 2 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:202)
17/05/05 17:03:41 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
17/05/05 17:03:41 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1)
17/05/05 17:03:41 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at mapToPair at ReadsSparkSink.java:236), which has no missing parents
17/05/05 17:03:41 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 470.3 KB, free 410.2 MB)
17/05/05 17:03:41 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 134.6 KB, free 410.0 MB)
17/05/05 17:03:41 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.30.0.122:34295 (size: 134.6 KB, free: 414.1 MB)
17/05/05 17:03:41 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:997
17/05/05 17:03:41 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at mapToPair at ReadsSparkSink.java:236)
17/05/05 17:03:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/05/05 17:03:41 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, ANY, 6028 bytes)
17/05/05 17:03:41 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
17/05/05 17:03:42 INFO NewHadoopRDD: Input split: hdfs://ip-172-30-0-86.ec2.internal:8020/chr1.bam:2825191424-2410664558591
17/05/05 17:03:48 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1339 bytes result sent to driver
17/05/05 17:03:48 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 6173 ms on localhost (executor driver) (1/1)
17/05/05 17:03:48 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/05/05 17:03:48 INFO DAGScheduler: ShuffleMapStage 1 (mapToPair at ReadsSparkSink.java:236) finished in 6.175 s
17/05/05 17:03:48 INFO DAGScheduler: looking for newly runnable stages
17/05/05 17:03:48 INFO DAGScheduler: running: Set()
17/05/05 17:03:48 INFO DAGScheduler: waiting: Set(ResultStage 2)
17/05/05 17:03:48 INFO DAGScheduler: failed: Set()
17/05/05 17:03:48 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[11] at mapToPair at ReadsSparkSink.java:273), which has no missing parents
17/05/05 17:03:48 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 541.3 KB, free 409.5 MB)
17/05/05 17:03:48 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 160.1 KB, free 409.4 MB)
17/05/05 17:03:48 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.30.0.122:34295 (size: 160.1 KB, free: 413.9 MB)
17/05/05 17:03:48 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:997
17/05/05 17:03:48 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 2 (MapPartitionsRDD[11] at mapToPair at ReadsSparkSink.java:273)
17/05/05 17:03:48 INFO TaskSchedulerImpl: Adding task set 2.0 with 4 tasks
17/05/05 17:03:48 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, ANY, 5766 bytes)
17/05/05 17:03:48 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, localhost, executor driver, partition 1, ANY, 5766 bytes)
17/05/05 17:03:48 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 4, localhost, executor driver, partition 2, ANY, 5766 bytes)
17/05/05 17:03:48 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 5, localhost, executor driver, partition 3, ANY, 5766 bytes)
17/05/05 17:03:48 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
17/05/05 17:03:48 INFO Executor: Running task 1.0 in stage 2.0 (TID 3)
17/05/05 17:03:48 INFO Executor: Running task 2.0 in stage 2.0 (TID 4)
17/05/05 17:03:48 INFO Executor: Running task 3.0 in stage 2.0 (TID 5)
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 10 ms
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 11 ms
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms
17/05/05 17:03:48 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms
17/05/05 17:03:51 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 172.30.0.122:34295 in memory (size: 132.8 KB, free: 414.0 MB)
17/05/05 17:03:51 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 172.30.0.122:34295 in memory (size: 134.6 KB, free: 414.2 MB)
17/05/05 17:03:52 INFO ExternalSorter: Thread 75 spilling in-memory map of 107.7 MB to disk (1 time so far)
17/05/05 17:03:52 INFO ExternalSorter: Thread 72 spilling in-memory map of 107.8 MB to disk (1 time so far)
17/05/05 17:03:52 INFO ExternalSorter: Thread 77 spilling in-memory map of 107.7 MB to disk (1 time so far)
17/05/05 17:03:52 INFO ExternalSorter: Thread 76 spilling in-memory map of 107.7 MB to disk (1 time so far)
17/05/05 17:03:54 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 17:03:54 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 17:03:54 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 17:03:55 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/05 17:03:58 INFO FileOutputCommitter: Saved output of task 'attempt_20170505170341_0011_r_000003_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output2.bam.parts/_temporary/0/task_20170505170341_0011_r_000003
17/05/05 17:03:58 INFO Executor: Finished task 3.0 in stage 2.0 (TID 5). 1921 bytes result sent to driver
17/05/05 17:03:58 INFO FileOutputCommitter: Saved output of task 'attempt_20170505170341_0011_r_000001_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output2.bam.parts/_temporary/0/task_20170505170341_0011_r_000001
17/05/05 17:03:58 INFO Executor: Finished task 1.0 in stage 2.0 (TID 3). 1921 bytes result sent to driver
17/05/05 17:03:58 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 5) in 9977 ms on localhost (executor driver) (1/4)
17/05/05 17:03:58 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 9994 ms on localhost (executor driver) (2/4)
17/05/05 17:03:58 INFO FileOutputCommitter: Saved output of task 'attempt_20170505170341_0011_r_000002_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output2.bam.parts/_temporary/0/task_20170505170341_0011_r_000002
17/05/05 17:03:58 INFO Executor: Finished task 2.0 in stage 2.0 (TID 4). 1921 bytes result sent to driver
17/05/05 17:03:58 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 4) in 10104 ms on localhost (executor driver) (3/4)
17/05/05 17:03:58 INFO FileOutputCommitter: Saved output of task 'attempt_20170505170341_0011_r_000000_0' to hdfs://ip-172-30-0-86.ec2.internal:8020/output2.bam.parts/_temporary/0/task_20170505170341_0011_r_000000
17/05/05 17:03:58 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1921 bytes result sent to driver
17/05/05 17:03:58 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 10369 ms on localhost (executor driver) (4/4)
17/05/05 17:03:58 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
17/05/05 17:03:58 INFO DAGScheduler: ResultStage 2 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:202) finished in 10.370 s
17/05/05 17:03:58 INFO DAGScheduler: Job 1 finished: saveAsNewAPIHadoopFile at ReadsSparkSink.java:202, took 16.702399 s
17/05/05 17:03:58 INFO SparkUI: Stopped Spark web UI at http://172.30.0.122:46483
17/05/05 17:03:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/05 17:03:59 INFO MemoryStore: MemoryStore cleared
17/05/05 17:03:59 INFO BlockManager: BlockManager stopped
17/05/05 17:03:59 INFO BlockManagerMaster: BlockManagerMaster stopped
17/05/05 17:03:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/05 17:03:59 INFO SparkContext: Successfully stopped SparkContext
[May 5, 2017 5:03:59 PM UTC] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark done. Elapsed time: 0.40 minutes.
Runtime.totalMemory()=799080448
17/05/05 17:03:59 INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/05/05 17:03:59 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.)
17/05/05 17:03:59 INFO ApplicationMaster: Deleting staging directory hdfs://ip-172-30-0-86.ec2.internal:8020/user/hadoop/.sparkStaging/application_1493961816416_0010
17/05/05 17:03:59 INFO ShutdownHookManager: Shutdown hook called
17/05/05 17:03:59 INFO ShutdownHookManager: Deleting directory /mnt1/yarn/usercache/hadoop/appcache/application_1493961816416_0010/spark-223a9e8b-0fe9-41f0-8bed-f843978f1882
17/05/05 17:03:59 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1493961816416_0010/spark-573a9c53-e268-4f3b-8907-1f35e5839788

lbergelson · 2017-05-09T21:25:39Z

@david-wb Thanks for the detailed information. That's really helpful. I don't think we're going to be able to get to this for the next few weeks since we're pretty swamped with work for our beta release. Hopefully it's not blocking you since it seems like the output of the tools is correct. We'll get back to you when we have a proposed PR for a fix that you can test.

david-wb · 2017-05-09T23:08:45Z

Sounds good. I can also run in client mode for the time being.

this fixes the yarn early termination bug reported in #2666 and #3166

lbergelson added bug Spark labels May 5, 2017

lbergelson added this to the 4.0 release milestone May 9, 2017

lbergelson mentioned this issue Jul 5, 2017

System.exit in Main.java lead to a failure in yarn cluster mode #3166

Closed

droazen assigned lbergelson Aug 1, 2017

lbergelson added a commit that referenced this issue Aug 1, 2017

removing the explicit System.exit(0) from Main

e2bf899

this fixes the yarn early termination bug reported in #2666 and #3166

lbergelson mentioned this issue Aug 1, 2017

removing the explicit System.exit(0) from Main #3400

Merged

lbergelson closed this as completed in 0565ee6 Aug 8, 2017

huguetpj mentioned this issue Sep 18, 2017

Not able to connect to Spark Cluster with Sparklyr 0.7.0-9019 sparklyr/sparklyr#997

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutdown hook called before final status was reported. #2666

Shutdown hook called before final status was reported. #2666

david-wb commented May 5, 2017 •

edited by lbergelson

Loading

lbergelson commented May 5, 2017

david-wb commented May 5, 2017 •

edited

Loading

lbergelson commented May 9, 2017

david-wb commented May 9, 2017

Shutdown hook called before final status was reported. #2666

Shutdown hook called before final status was reported. #2666

Comments

david-wb commented May 5, 2017 • edited by lbergelson Loading

lbergelson commented May 5, 2017

david-wb commented May 5, 2017 • edited Loading

lbergelson commented May 9, 2017

david-wb commented May 9, 2017

david-wb commented May 5, 2017 •

edited by lbergelson

Loading

david-wb commented May 5, 2017 •

edited

Loading