Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cause of read depth of 0.000x after polishing #13

Open
desmodus1984 opened this issue Jul 22, 2024 · 5 comments
Open

Cause of read depth of 0.000x after polishing #13

desmodus1984 opened this issue Jul 22, 2024 · 5 comments

Comments

@desmodus1984
Copy link

Hi,

I am trying to assemble the genome of a worm, and I was hoping that minipolish would detect the mitochondrial genome and output it in the gfa file.
I have sequencing one of my worm species a lot, and I couldn't get much data, furthermore, the HPC system has a GPU card that is not compatible for dorado, so I am using dorado in CPU mode. After all the troubles, I was able to get about 3 Gb of data, but many reads with low quality, and many short, I filtered out reads with quality less than 20 and shorter than 1000 bp. My dataset is 1.3 Gb, with max read length of 113kb and average of 8.5k.

I followed the instructions:
minimap2 -x ava-ont -t 12 Ju760.Jul22.fastq Ju760.Jul22.fastq | gzip -1 > Ju760.Jul22.paf.gz ``miniasm -f Ju760.Jul22.fastq ./Ju760.Jul22.paf.gz > Ju760.Jul22.gfa which worked fine,

but then, when I tried running minipolish
minipolish -t 12 Ju760.Jul22.fastq Ju760.Jul22.gfa > Ju760.Jul22.polish.gfa
I got a lot of empty segments removed,
Racon round 2, started with this info
Running Racon on round_2:
reads: Ju760.Jul22.fastq (156,105 reads)
input: /tmp/tmpbvbg0wzi/round_2.fasta (0 bp)
alignments: /tmp/tmpbvbg0wzi/round_2.paf (0 alignments)
output: /tmp/tmpbvbg0wzi/round_2_polished.fasta (0 bp)

And the output was:
Aligning reads:
reads: Ju760.Jul22.fastq (156,105 reads)
contigs: /tmp/tmpbvbg0wzi/depths.fasta (0 bp)
alignments: /tmp/tmpbvbg0wzi/depths.paf (0 alignments)
mean depth: 0.000x

Could you explain how could I get a mean depth of 0.000x, I would expect to have at least 2x, from my data.
Is there any parameter I can change during the process to get some gfa, at least the mitochondrial genome?

For instance, I mapped the reads to the c elegans mitogenome, and I got 277 sequences, making about 1.3 MB.
Should I be able to recover the mitochondrial genome in the gfa?

Thanks;

Juan Pablo

@rrwick
Copy link
Owner

rrwick commented Jul 25, 2024

Hi Juan,

Something has clearly gone wrong in during assembly or polishing: the input file to Racon seems to have a total size of 0 bp. I'm not sure where the issue is, but an assembly with a mean depth of 2x is bound to have some problems!

If all you're interested in is the mitochondrial genome, I would suggest trying to assemble just the 277 reads you identified as mitochondrial. This should be very fast and you'll know that it worked well if you get a circular contig of about the right size.

Ryan

@desmodus1984
Copy link
Author

Hello Ryan,

I am extremely surprised right now haha.

I was finally able to get all the reads from my small/inefficient Nanopore run through dorado-CPU. I mapped them to the mitochondrial genome of C. elegans, and then I recovered the mapped reads.
I tried the mapping/assembly/polishing twice, once, with all the reads, and then with reads with max-length of 15k. I used this max because the genome of C elegans is 13.7 kb.

I would appreciate if you could give me your comments about my approaches.
I am surprised because the mitochondrial assembly with all the reads (max 44kb), gave me a circular mitochondrial genome (yay! you can imagine how happy I am!) of 10,043 bp and depth of 163x; the one with reads up to 15kb, I got a circular mitochondrial genome (YAY!!!), BUT SIGNIFICANTLY LONGER - 15,959 bp and Depth of 102.5x.
image.

Could you find a reason why using longer reads gave such a shorter circular graph?
I would expect that the reads in both cases would support or pinpoint to a mitochondrial genome of size Z; if the reads are weird, I imagine so hard trimming is happening, and edges with low support would be cut, but I don't see why it would end in a shorter graph.
image

The code I used goes like this:
minimap2 -x ava-ont -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Lig.P-1.mapped.max15k.fastq | gzip -1 > Ju760.Jul25.max15k.paf.gz miniasm -f Ju760.Lig.P-1.mapped.max15k.fastq **./**Ju760.Jul25.max15k.paf.gz > Ju760.Jul25.max15k.gfa minipolish -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Jul25.max15k.gfa > Ju760.Jul25.max15k.polish.gfa
Does it matter the "./" in miniasm?
Without it, It doesn't seem to work.

Looking forward to your comments and suggestions to help me confirm that the 15kb mitochondrial genome I obtained is reliable and I can trust it.

Thank you very much;

Juan Pablo

@rrwick
Copy link
Owner

rrwick commented Jul 26, 2024

That's a tough one, and I don't know why you're getting the discrepancy. It will probably take some experimentation and detective work!

Some miscellaneous thoughts:

  • You're using a C. elegans mito genome to select your reads. Is that a reliable reference? If the C. elegans mito genome is too different from yours, you might be missing some reads.
  • Try other assemblers, e.g. Canu and Flye on your read sets. See if they tend to produce the 10 kbp version or the 15 kbp version. Can then use Trycycler to build a clean consensus from your alternative assemblies.
  • Long-read assemblers often struggle with small replicons like this. Assuming that your mito genome isn't full of repeats, you might paradoxically get a better assembly by chopping up your ONT reads into smaller pieces (e.g. 2 kbp) and then assembling.
  • For each of your assemblies, try aligning your ONT reads and then visualising the alignments in IGV. Sometimes this can help you spot oddities.

@desmodus1984
Copy link
Author

Thanks.

Do you know if the polished assembly graph can't be polished a second round?
I got all the read from the other sequencing attemps, and I got some more reads, and I got a bigger asembly, 15976, when the one I got first was 15959. Unless the algorithm is deterministic and not stochastic/random, I should get the same result.
Thus, I got the polish assembly graph: Ju760.Jul28.polish.min500.gfa, and tried again to repolish it:

minipolish -t 12 Ju760.Jul28.mapped.min100.max15.fastq Ju760.Jul28.polish.min500.gfa > Ju760.Jul28.polish.min500.1.gfa

and I got an error:
Checking requirements
Minipolish requires Minimap2 and Racon to run, so it checks for these tools now.

Minimap2 found: /home/juaguila/miniconda3/envs/pomoxis/bin/minimap2 (v2.22-r1101)
Racon found: /home/juaguila/miniconda3/envs/pomoxis/bin/racon (v1.4.20)

Loading graph
Loading the miniasm GFA graph into memory.

Ju760.Jul28.polish.min500.gfa
1 segments (15,976 bp)
2 links

Initial polishing round
The first round of polishing is done on a per-segment basis and only uses reads which are definitely
associated with the segment (because the GFA indicated that they were used to make the segment).

Error: could not find /tmp/tmpabf382_o/utg000001c_reads.fastq

Could you tell why this is happening?

Thanks;

@rrwick
Copy link
Owner

rrwick commented Jul 29, 2024

It seems that Minipolish is having issues getting reads from its temporary file. Maybe you don't have read/write permissions for /tmp or something like that? You could try setting your TMPDIR environment variable to somewhere else (where you definitely have read/write permissions).

If that doesn't help, can you upload the reads here or email them to me and I'll see if I can reproduce the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants