Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastq outputs are missed during cellranger mkfastq due to directory structure #6189

Open
2 tasks done
julicudini opened this issue Aug 15, 2024 · 3 comments · May be fixed by #6190
Open
2 tasks done

fastq outputs are missed during cellranger mkfastq due to directory structure #6189

julicudini opened this issue Aug 15, 2024 · 3 comments · May be fixed by #6190
Labels
bug Something isn't working

Comments

@julicudini
Copy link

Have you checked the docs?

Description of the bug

I noticed when running the nfcore/demultiplex pipeline withcellranger_mkfastq as the demultiplexer module that the outputs do not match what I get when I run cellranger mkfastq (same version, and same version of bcl2fastq) on its own, independent of nextflow. I tracked down that this is not an issue with the demultiplex pipeline but instead the cellranger_mkfastq module. What is missing is the outs/fastq_path/Sample_Project dir, which contains the sample fastqs, whereas the outs/fastq_path dir only contains the Undetermined fastqs (explained here):

This example was produced with a sample sheet that included tiny-bcl as the Sample_Project, so the directory containing the sample folders is called tiny-bcl. If a Sample_Project was not specified, or if a simple layout CSV file was used (which does not have a Sample_Project column), the directory containing the sample folders would be named according to the flow cell ID instead.
ls -l tiny-bcl/outs/fastq_path/

drwxr-xr-x 3 jdoe jdoe 3 Nov 14 12:26 Reports
drwxr-xr-x 2 jdoe jdoe 8 Nov 14 12:26 Stats
drwxr-xr-x 3 jdoe jdoe 3 Nov 14 12:26 tiny-bcl (note this is the key dir where sample fastqs are)
-rw-r--r-- 1 jdoe jdoe 20615106 Nov 14 12:26 Undetermined_S0_L001_I1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 20615106 Nov 14 12:26 Undetermined_S0_L001_I2_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 51499694 Nov 14 12:26 Undetermined_S0_L001_R1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 152692701 Nov 14 12:26 Undetermined_S0_L001_R2_001.fastq.gz

What this means is that the line that defines the output of cellranger mkfastq in main.nf as outs/fastq_path/*.fastq.gz only captures the Undetermined files and misses the actual sample files. Currently the line reads
path "**/outs/fastq_path/*.fastq.gz", emit: fastq
and I was able to fix this by changing the line to
path("*_outs/outs/fastq_path/{*.fastq.gz,**/*.fastq.gz}"), emit: fastq
Which instead captures any fastq file in any nested dir. I think this is better than trying to infer the flowcell id in order to search the directory that may or may not be made. I have a PR that I can submit to make this small fix

Command used and terminal output

No response

Relevant files

No response

System information

No response

@julicudini julicudini added the bug Something isn't working label Aug 15, 2024
@julicudini julicudini linked a pull request Aug 15, 2024 that will close this issue
17 tasks
@apeltzer
Copy link
Member

Should also be a bug report in demultiplex, we might have to do a release for 1.5.1 to fix this

@apeltzer
Copy link
Member

cc @atrigila / @nschcolnicov

@nschcolnicov
Copy link
Contributor

@julicudini Thank you very much for the detailed description and proposed solution, I'll fix this ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants