Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-build human genome annotations with GDC fasta #136

Merged
merged 26 commits into from
Aug 27, 2024
Merged

Conversation

kelly-sovacool
Copy link
Member

@kelly-sovacool kelly-sovacool commented Jun 21, 2024

Changes

  • Rebuilt hg19 and hg38 annotations with the GDC reference fasta, which includes additional sequences. These are stored in a shared location for use in other pipelines too: /data/CCBR_Pipeliner/db/PipeDB/GDC_refs/
  • Copied files to FRCE: /mnt/projects/CCBR-Pipelines/db/GDC_refs

See the snakemake workflow here for how these were built: https://github.com/CCBR/build-renee-refs

Issues

PR Checklist

(Strikethrough any points that are not applicable.)

  • This comment contains a description of changes with justifications, with any relevant issues linked.
  • Update docs if there are any API changes.
  • Update CHANGELOG.md with a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/

resolves #129

these files are now in a shared location instead of the renee resources dir since they are useful for other pipelines too
@kopardev kopardev added the RENEE RepoName label Jun 21, 2024
also suggest running `renee run --help` for updated list
@kelly-sovacool kelly-sovacool marked this pull request as ready for review June 24, 2024 15:23
@kelly-sovacool kelly-sovacool changed the title Re-build hg38 annotations with GDC fasta Re-build human genome annotations with GDC fasta Aug 1, 2024
@kelly-sovacool
Copy link
Member Author

kelly-sovacool commented Aug 2, 2024

Running tests.

RENEE_REPO=/data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool
for genome in $(ls $RENEE_REPO/config/genomes/biowulf/ | grep hg |  sed 's/.json//'); do
    $RENEE_REPO/bin/renee run \
        --input $RENEE_REPO/.tests/*.R1.fastq.gz \
        --genome $genome \
        --mode slurm \
        --output /data/$USER/renee_test_$genome \
        --sif-cache /data/CCBR_Pipeliner/SIFS \
        &> /data/$USER/renee_shell.${genome}.out
done

Job status

generate table
cd /data/sovacoolkl/
for f in $(ls | grep renee_shell); do genome=$(echo $f |  sed 's/renee_shell\.//' | sed 's/\.out//'); jobid=$(grep master $f | sed 's/.*: //'); echo "| $genome | $jobid |  |"; done
genome slurm id status
hg19_19 31994530 RUNNING
hg19_36lift37 31994535 RUNNING
hg38_30 31994547 RUNNING
hg38_34 31994560 RUNNING
hg38_36 31994162 RUNNING
hg38_38 31994566 RUNNING
hg38_41 31994570 RUNNING
hg38_45 31994583 RUNNING

Update

All pipeline runs completed successfully. Jobby initially failed because it wasn't in my path when I submitted these, but after running jobby manually I verified that all jobs completed successfully.

@kelly-sovacool kelly-sovacool marked this pull request as draft August 2, 2024 18:40
Copy link
Member

@kopardev kopardev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments before merging.

@kelly-sovacool kelly-sovacool marked this pull request as ready for review August 5, 2024 17:44
@kelly-sovacool
Copy link
Member Author

@kopardev this is ready for your re-review

@kelly-sovacool kelly-sovacool merged commit 1dd9309 into main Aug 27, 2024
5 checks passed
@kelly-sovacool kelly-sovacool deleted the iss-129 branch August 27, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RENEE RepoName
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change fasta for hg19 Change human reference fasta set default human genome+annotation combo to match GDC
2 participants