-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removed duplicated and trailing whitespace #13
base: master
Are you sure you want to change the base?
Conversation
Thanks Brandon - I'll check how those occurred in index generation process. |
For sequence.index files, we intended to have 5 columns for capturing paired reads with their md5s and sample name (column5). For alignment.index file, we intended to have 4 columns for bam and bam.bai with their md5. All the examples you listed above regarding sequence.index files were not paired reads, thus two empty fields were included there. For some reason during updating those 4 alignment index files, extra space or tab were introduced, but now have been fixed. |
Okay- I think that makes sense. Let me just make sure I understand. You're saying that:
Is it also safe to assume the following?
Also- thanks for fixing those 4 alignment index files 😄 🙏 |
yeah your assumptions are correct ! Also, please inform us when you find any unusual in index files. Personally I really appreciate your efforts in helping us to make this resource more valuable. chunlin |
Glad I can help 😄 If I come across any other things, I'll share my findings in an issue. Another clarifying question for you: are the bionano alignment files supposed to have only 2 columns (XMAP_CMAP & XMAP_CMAP_MD5)? |
actually Bionano xmap/camp index was an exception for alignment.index, and described in https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/README.ftp_structure: The format of sequence.index (if no paired data, column 3 and 4 will be empty) as follow: For hdf5: For SOLiD xsq: For BioNano bnx: The format of alignment.index: For BioNano XMAP or CMAP: Many thanks to you Brandon. chunlin |
I removed the duplicated and trailing whitespace from index files. In some cases, 2 or more tabs were present between columns. I also removed the trailing whitespace at the end of the lines. Otherwise, the text remains the same.
Issues were fixed with GNU sed like this:
sed -r -i 's,\t+,\t,g' file1 file2 ... fileN
sed -r -i 's,[\t ]+$,,' file1 file2 ... fileN
Here is the list of affected files: