-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdb-validate
does not detect file corruption
#285
Comments
I have submitted a PR to the modules repository: nf-core/modules#5024 |
My PR has been merged into the main branch of nf-core/modules. What is the procedure to get this into the fetchngs pipeline? |
Make a PR that updates the module to the latest state. Hopefully, you can simply use the nf-core tool for that. |
@suhrig please let me know if you have any issue doing that |
Thanks for your guidance. It worked. The PR is ready for review IMO. |
Description of the bug
As explained in ncbi/sra-tools#896,
vdb-validate
does not detect file corruption if the prefetched files do not contain MD5 checksums. It has happened to me many times that downloaded files turn out to be corrupt, if I use the optionforce_sratools_download
. What is worse is that extracting the files usingfasterq-dump
does not always result in an error even if the file is corrupt. It is even conceivable that the extracted FastQ file looks perfectly intact with only some bases or quality values being changed. As such, the error may go completely unnoticed.I propose that the validation procedure be changed. Namely, I find that using the following
curl
command to fetch the MD5 sum of the prefetched SRA file and then using themd5sum
command-line utility to confirm the checksum should be more reliable:Admittedly, I don't know whether there are situations where the MD5 sum cannot be obtained via the above
curl
command. Maybe it would be best to first try to obtain the MD5 sum, and if this fails use the currentvdb-validate
command as a fallback.Possibly, I will find time to submit a PR. I'm reporting this here in case someone else is faster.
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: