Tune Juicer for Cheaha #2

jprorama · 2023-09-23T13:32:01Z

Is your feature request related to a problem? Please describe.
Juicer can't use the SLURM scheduler on Cheaha

Describe the solution you'd like
Run juicer.sh in a screen/tmux/byobu session on the login node and have all work submitted as jobs to the cluster.

Describe alternatives you've considered
Running on a single node but takes to long.

Additional context
We need to be able to demonstrate successful operation of juicer.sh on cheaha. This requires customizing the juicer environment to use the partitions and modules available on cheaha, as described here:

https://github.com/aidenlab/juicer/wiki/Running-Juicer-on-a-cluster

This demonstration needs to include an example data set that can be run quickly but accurately reflects a full-scale run.

The sample data listed at the above wiki docs link is no longer available.

jprorama · 2023-09-23T13:33:44Z

Proposed changes are available in pull request #1

jprorama · 2023-09-23T13:36:14Z

The juicer forum is a potential resource for customizing the slurm support.

https://groups.google.com/g/3d-genomics/search?q=slurm

jprorama · 2023-09-23T13:43:26Z

I've opened this issue requesting correction or clarification on running juicer with test data.

aidenlab#331

jprorama · 2023-09-23T17:01:03Z

Just for guidance, the slurm version of de-dup chimera reads awk script, splits the sam data at every 1million reads at a known non-duplicate boundary. It checks to see if any of last 6 fields of the "cb" record are different from the prior record. If they are, they will not be duplicates. It places all those reads in a file and submits a job to process those records. It continues this step until all reads are submitted for de-duping, so the max time for dedup will be the time it takes to process 1million records.

The code should work fine but we will need to improved the following line in that script. It has a hard coded email address and host name. This should be driven by parameters.

juicer/SLURM/scripts/split_rmdups_sam.awk

Line 98 in 290e443

printf("#!/bin/bash -l\n#SBATCH -o %s/dup-mail.out\n#SBATCH -e %s/dup-mail.err\n#SBATCH -p %s\n#SBATCH -J %s_msplit0\n#SBATCH -d singleton\n#SBATCH -t 1440\n#SBATCH -c 1\n#SBATCH --ntasks=1\ndate;\necho %s %s %s %s | mail -r [email protected] -s \"Juicer pipeline finished successfully @ Voltron\" -t %[email protected];\ndate\n", debugdir, debugdir, queue, groupname, topDir, site, genomeID, genomePath, user) > sscriptname;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune Juicer for Cheaha #2

Tune Juicer for Cheaha #2

jprorama commented Sep 23, 2023

jprorama commented Sep 23, 2023

jprorama commented Sep 23, 2023 •

edited

Loading

jprorama commented Sep 23, 2023 •

edited

Loading

jprorama commented Sep 23, 2023

Tune Juicer for Cheaha #2

Tune Juicer for Cheaha #2

Comments

jprorama commented Sep 23, 2023

jprorama commented Sep 23, 2023

jprorama commented Sep 23, 2023 • edited Loading

jprorama commented Sep 23, 2023 • edited Loading

jprorama commented Sep 23, 2023

jprorama commented Sep 23, 2023 •

edited

Loading

jprorama commented Sep 23, 2023 •

edited

Loading