Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of barcodes doesn't match with cellranger output. #23

Open
NKleinenkuhnen opened this issue Jan 10, 2020 · 1 comment
Open

Number of barcodes doesn't match with cellranger output. #23

NKleinenkuhnen opened this issue Jan 10, 2020 · 1 comment

Comments

@NKleinenkuhnen
Copy link

Hey,

Thank you very much for your awesome tool! I recently ran into the problem while trying to create a snap file from the output of cellranger. So far I tried to entry points: 1. the position sorted bam file 2. the fragment tsv file.
However, in both approaches I ended up with way more barcodes in my snap file than I got in the result report from 10x. In scenario 1 I get 40k barcodes and in scenario 2 20k. According to the 10x summary the dataset should contain 8199 cells.
I followed your excellent step-by-step tutorial (https://github.com/r3fang/SnapATAC/wiki/FAQs#10X_snap) and just copied the commands and changed the filenames. I worked with Python 3.7 and the latest version of SnapTools on my Mac.
Importing the snap file into R and processing it works like a charm but I couldn't solve the barcode issue myself.
I should note that in scenario 1 I had two samples which I processed separately with the same commands and then merged them via createSnap. I hope I could provide you enough information. If you need more just let me know.
Thanks in Advance!

@mej54
Copy link

mej54 commented Jan 27, 2020

Hi there,

I've also noticed a difference between the barcodes based on CellRanger output. I've been using snap-pre with possorted_bam.bam from CellRanger to create snap files as outlined:

cat <( cat $DIR/$SAMPLE.header.sam ) \
<( samtools view $BAM | awk '{for (i=12; i<=NF; ++i) { if ($i ~ "^CB:Z:"){ td[substr($i,1,2)] = substr($i,6,length($i)-5); } }; printf "%s:%s\n", td["CB"], $0 }' ) \
| samtools view -bS - > $DIR/$SAMPLE.snap.bam

samtools sort -n -@ 10 -m 1G $DIR/$SAMPLE.snap.bam -o $DIR/$SAMPLE.snap.nsrt.bam

When I was looking into the promoter ratio using the single_cell.csv files, I noticed there were barcodes in the snap files that were not in the single_cell.csv files (which I believe should contain all fragments). I looked into this further and I'm wondering if at some point snap-pre is taking information from the "CR:Z" flag in the bam instead of the error-corrected barcodes "CR:B"? When I search the barcodes in the snap file that weren't found in the single_cell.csv file, they match to the barcodes under CR:Z, not CR:B (even though the CR:B barcode was added to the read name as outline above).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants