Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdb-validate does not detect file corruption #285

Closed
suhrig opened this issue Feb 20, 2024 · 5 comments · Fixed by #305
Closed

vdb-validate does not detect file corruption #285

suhrig opened this issue Feb 20, 2024 · 5 comments · Fixed by #305
Labels
bug Something isn't working

Comments

@suhrig
Copy link

suhrig commented Feb 20, 2024

Description of the bug

As explained in ncbi/sra-tools#896, vdb-validate does not detect file corruption if the prefetched files do not contain MD5 checksums. It has happened to me many times that downloaded files turn out to be corrupt, if I use the option force_sratools_download. What is worse is that extracting the files using fasterq-dump does not always result in an error even if the file is corrupt. It is even conceivable that the extracted FastQ file looks perfectly intact with only some bases or quality values being changed. As such, the error may go completely unnoticed.

I propose that the validation procedure be changed. Namely, I find that using the following curl command to fetch the MD5 sum of the prefetched SRA file and then using the md5sum command-line utility to confirm the checksum should be more reliable:

curl 'https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?filetype=run&acc=SRRxxxxxxx'

Admittedly, I don't know whether there are situations where the MD5 sum cannot be obtained via the above curl command. Maybe it would be best to first try to obtain the MD5 sum, and if this fails use the current vdb-validate command as a fallback.

Possibly, I will find time to submit a PR. I'm reporting this here in case someone else is faster.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@suhrig suhrig added the bug Something isn't working label Feb 20, 2024
@suhrig
Copy link
Author

suhrig commented Mar 1, 2024

I have submitted a PR to the modules repository: nf-core/modules#5024

@suhrig
Copy link
Author

suhrig commented Apr 7, 2024

My PR has been merged into the main branch of nf-core/modules. What is the procedure to get this into the fetchngs pipeline?

@Midnighter
Copy link
Contributor

Make a PR that updates the module to the latest state. Hopefully, you can simply use the nf-core tool for that.

@maxulysse
Copy link
Member

@suhrig please let me know if you have any issue doing that

@suhrig
Copy link
Author

suhrig commented Apr 8, 2024

Thanks for your guidance. It worked. The PR is ready for review IMO.

@suhrig suhrig closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants