Enable block gzip compression example #85

sagehen03 · 2023-04-18T20:12:24Z

Making this work comes down to specifying the codec when writing from pyspark and also having that codec on the Spark classpath.

sagehen03 · 2023-04-18T20:16:52Z

bioindex/src/main/resources/compression.sh

@@ -0,0 +1,3 @@
+#!/bin/bash -xe
+
+sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/


I built the hail.jar from the hail source (it needs to compiled using java 8 since that's what our EMR clusters use) and then uploaded it to S3. We probably need a better location in s3, but I used this one now since it's not in production use. This solution also relies on EMR continuing to put /usr/lib/spark/jars on the classpath.

So for things like this we tend to use s3://dig-aggregator-data/bin/

psmadbec

Magical. For the generation of the hail.jar either we'll want instructions, or probably we'll want something to be put into dig-analysis-data/scripts which is where I put things that have a specific generation sequence to help in case someone needs to generate or update the file itself.

psmadbec · 2023-04-18T21:05:32Z

bioindex/src/main/resources/compression.sh

@@ -0,0 +1,3 @@
+#!/bin/bash -xe
+
+sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/


So for things like this we tend to use s3://dig-aggregator-data/bin/

seems to work

14457d6

sagehen03 commented Apr 18, 2023

View reviewed changes

removing unused import

32e78b8

psmadbec approved these changes Apr 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable block gzip compression example #85

Enable block gzip compression example #85

sagehen03 commented Apr 18, 2023 •

edited

Loading

sagehen03 Apr 18, 2023 •

edited

Loading

psmadbec Apr 18, 2023

psmadbec left a comment

psmadbec Apr 18, 2023

		@@ -0,0 +1,3 @@
		#!/bin/bash -xe

		sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/

Enable block gzip compression example #85

Are you sure you want to change the base?

Enable block gzip compression example #85

Conversation

sagehen03 commented Apr 18, 2023 • edited Loading

sagehen03 Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

psmadbec Apr 18, 2023

Choose a reason for hiding this comment

psmadbec left a comment

Choose a reason for hiding this comment

psmadbec Apr 18, 2023

Choose a reason for hiding this comment

sagehen03 commented Apr 18, 2023 •

edited

Loading

sagehen03 Apr 18, 2023 •

edited

Loading