Skip to content

Spark Evaluation Results

Tom White edited this page Nov 3, 2017 · 1 revision

This page records some of the results of running the Spark evaluation scripts.

Date Pipeline Input data Cluster Command (see below) Time (min) Notes
2017-11-02 MD, BQSR, HC Exome (18.4 GB) 10 nodes n1-standard-16 1 28.45
2017-11-02 Reads Pipeline Exome (18.4 GB) 10 nodes n1-standard-16 2 24.08 15% faster
2017-11-02 MD, BQSR, HC Genome (133.6 GB 20 nodes n1-standard-16 3 145.92
2017-11-02 Reads Pipeline Genome (133.6 GB) 20 nodes n1-standard-16 4 99.29 32% faster

Commands

# 1
nohup ./run_gcs_cluster.sh copy_exome_to_hdfs_on_gcs.sh exome_md-bqsr-hc_hdfs.sh &
# 2
nohup ./run_gcs_cluster.sh copy_exome_to_hdfs_on_gcs.sh exome_reads-pipeline_hdfs.sh &
# 3
NUM_WORKERS=20 nohup ./run_gcs_cluster.sh copy_genome_to_hdfs_on_gcs.sh genome_md-bqsr-hc_hdfs.sh &
# 4
NUM_WORKERS=20 nohup ./run_gcs_cluster.sh copy_genome_to_hdfs_on_gcs.sh genome_reads-pipeline_hdfs.sh &