Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update module sratools prefetch #305

Merged
merged 6 commits into from
May 6, 2024
Merged

Update module sratools prefetch #305

merged 6 commits into from
May 6, 2024

Conversation

suhrig
Copy link

@suhrig suhrig commented Apr 8, 2024

This PR fixes #285. Downloads using sratools often resulted in corrupt files, because the tool to check the file integrity, vdb-validate is not always reliable. This PR updates the module sratools/prefetch, which includes a fix which performs a manual MD5 sum check in cases where vdb-validate is unreliable.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • CHANGELOG.md is updated.

Copy link

github-actions bot commented Apr 8, 2024

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @suhrig,

It looks like this pull-request is has been made against the suhrig/fetchngs master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the suhrig/fetchngs dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@suhrig suhrig changed the base branch from master to dev April 8, 2024 09:05
Copy link

github-actions bot commented Apr 8, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 3056a3a

+| ✅ 155 tests passed       |+
#| ❔   5 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml

❔ Tests ignored:

  • files_exist - File is ignored: conf/modules.config
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/fetchngs/fetchngs/.github/workflows/awstest.yml
  • multiqc_config - 'assets/multiqc_config.yml' not found

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-04-08 12:52:28

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, all tests are OK.
Just checking some extra things before merging.
Thanks a lot @suhrig for the work

@suhrig
Copy link
Author

suhrig commented Apr 27, 2024

Hi @maxulysse, can I help you with the additional checks? All I can say is I have run this several times now without any issues. Regards, Sebastian

Comment on lines +50 to +59
# check file integrity using vdb-validate or (when archive contains no checksums) md5sum
vdb-validate !{id} > vdb-validate_result.txt 2>&1 || exit 1
if grep -q "checksums missing" vdb-validate_result.txt; then
VALID_MD5SUMS=$(curl --silent --fail --location --retry 3 --retry-delay 60 'https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?filetype=run&acc=!{id}')
LOCAL_MD5SUMS=$(md5sum !{id}/* | cut -f1 -d' ')
if ! grep -q -F -f <(echo "$LOCAL_MD5SUMS") <(echo "$VALID_MD5SUMS"); then
echo "MD5 sum check failed" 1>&2
exit 1
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only issue I can see is that we're no longer doing vdb-validate over !{id.}sralite.
But I do love the added md5 sum possibility.
Can you keep the sralite part in this logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is no longer needed. It was needed in old version of sratools. But the latest version places .sralite files in a subdirectory named after the SRA ID just like regular .sra files. prefetch dowloads the dataset which gave rise to this code just fine (SRR1806585 , see issue #162) and vdb-validate checks it properly. fasterq-dump still fails at a later step as mentioned in issue #162, but that's because it uses an outdated version of sratools (2.11). I guess it makes sense to update fasterq-dump as part of this PR as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then we're good into merging this PR.
I'd rather we don't update fasterq-dump for now as it would be re-creating other issues: #221

Copy link
Contributor

@Midnighter Midnighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side it looks good, but @maxulysse 's comment is a good question, of course.

@maxulysse maxulysse merged commit c60d09b into nf-core:dev May 6, 2024
62 checks passed
@suhrig suhrig deleted the update_module_sratools_prefetch branch May 6, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vdb-validate does not detect file corruption
3 participants