-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use md5sum to check download integrity with vdb-validate as fallback #5024
use md5sum to check download integrity with vdb-validate as fallback #5024
Conversation
…les into md5sum_replaces_vdb-validate
@Midnighter: Do you need any further input from my side before you can start with the review? The pull request is ready for review IMO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for uncovering these problems and suggesting these changes. I see two main issues with the PR:
- We also need container images that provide the right dependencies.
- As you rightfully point out in your comment
vdb-validate
does not detect file corruption ncbi/sra-tools#896 (comment), it would be ideal to be able to check if md5sums are present before making another network request.
modules/nf-core/sratools/prefetch/templates/retry_with_backoff.sh
Outdated
Show resolved
Hide resolved
modules/nf-core/sratools/prefetch/templates/retry_with_backoff.sh
Outdated
Show resolved
Hide resolved
Thanks for all the useful feedback!
I will nudge the developers once more about this. I agree this would be ideal. |
…les into md5sum_replaces_vdb-validate
Hi @Midnighter, I have implemented your requested changes.
I no longer need
It turns out the maintainers had added feature to confirm that the output of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer need jq, since I use grep to search for the valid checksum, which results in much simpler code than using jq would. So the only additional tool is curl, which is part of the sratools container already. Do we still need an additional container?
No, that's fine then.
Changes look good to me! Really happy that you managed to avoid the curl call when it's not necessary.
My only nitpick is, that I prefer long form options in scripts, e.g., grep --fixed-strings
rather than grep -F
because it's typically much more clear what the intention is then. I tend to forget myself, what the short form means, so it might also help future you 😉. It's certainly fine to merge like this, too, though!
modules/nf-core/sratools/prefetch/templates/retry_with_backoff.sh
Outdated
Show resolved
Hide resolved
modules/nf-core/sratools/prefetch/templates/retry_with_backoff.sh
Outdated
Show resolved
Hide resolved
Sure, I will adapt the script accordingly! |
I have implemented the suggestions. I also added an error message when the manual MD5 sum check failed, because I realized it may be confusing when the process just fails silently. Do you know why the automated checks won't launch? They have been stuck in "awaiting approval" state from the start. I cannot merge the changes otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
…les into md5sum_replaces_vdb-validate
Head branch was pushed to by a user without write access
I have satisfied the requirements of the failed editorconfig check. However, as a result of this the auto-merge feature was disabled and the PR is stuck in awaiting approval again. Can you kindly explain what you did to fix this so I can help myself in the future? |
You need to have certain permissions on the repository. I'm not exactly sure which role. Probably at least write permissions. |
…les into md5sum_replaces_vdb-validate
Head branch was pushed to by a user without write access
Hi @Midnighter. Sorry to bother you again. There was a hidden failed test which I did not notice, because all the ticks in the overview were green. A few MD5 sums of files which nf-test checked for integrity needed updating. As a result of my new commit, the checks are in pending state again. :-/ I don't know why this happens or how to fix this. So I have to ask you (hopefully only) once more to work your magic? Thanks! Also, GitHub added a new reviewer automatically. I did not do this. |
No worries, not your fault. |
Head branch was pushed to by a user without write access
One more push please? I used long options for |
It's so annoying when the accepted options for these standard tools are different between Mac's Unix and Linux. Or was it the conda version? |
It looks like the busybox implementation is used for the tests. The tests have completed successfully. 🥳 Do you have permission to merge the changes or shall we wait for the review from @drpatelh? I think GitHub added him as a reviewer when I modified the MD5 sums of the nf-test files, presumably because he is the owner of those files. |
Strange, I had set it to auto merge already... not sure why that didn't happen. |
Many thanks for all your help! |
PR checklist
As explained in ncbi/sra-tools#896,
vdb-validate
does not detect file corruption if the prefetched files do not contain MD5 checksums. It has happened to me many times that downloaded files turn out to be corrupt, if I use the optionforce_sratools_download
(now--download_method sratools
). What is worse is that extracting the files usingfasterq-dump
does not always result in an error even if the file is corrupt. It is even conceivable that the extracted FastQ file looks perfectly intact with only some bases or quality values being changed. As such, the error may go completely unnoticed.This PR fixes this by (1) fetching the md5sum from the SRA Data Locator API and (2) performing a manual md5sum check. The current method,
vdb-validate
is only used anymore if the md5sum cannot be obtained from the API.