Cause of read depth of 0.000x after polishing #13

desmodus1984 · 2024-07-22T19:17:16Z

Hi,

I am trying to assemble the genome of a worm, and I was hoping that minipolish would detect the mitochondrial genome and output it in the gfa file.
I have sequencing one of my worm species a lot, and I couldn't get much data, furthermore, the HPC system has a GPU card that is not compatible for dorado, so I am using dorado in CPU mode. After all the troubles, I was able to get about 3 Gb of data, but many reads with low quality, and many short, I filtered out reads with quality less than 20 and shorter than 1000 bp. My dataset is 1.3 Gb, with max read length of 113kb and average of 8.5k.

I followed the instructions:
minimap2 -x ava-ont -t 12 Ju760.Jul22.fastq Ju760.Jul22.fastq | gzip -1 > Ju760.Jul22.paf.gz ``miniasm -f Ju760.Jul22.fastq ./Ju760.Jul22.paf.gz > Ju760.Jul22.gfa which worked fine,

but then, when I tried running minipolish
minipolish -t 12 Ju760.Jul22.fastq Ju760.Jul22.gfa > Ju760.Jul22.polish.gfa
I got a lot of empty segments removed,
Racon round 2, started with this info
Running Racon on round_2:
reads: Ju760.Jul22.fastq (156,105 reads)
input: /tmp/tmpbvbg0wzi/round_2.fasta (0 bp)
alignments: /tmp/tmpbvbg0wzi/round_2.paf (0 alignments)
output: /tmp/tmpbvbg0wzi/round_2_polished.fasta (0 bp)

And the output was:
Aligning reads:
reads: Ju760.Jul22.fastq (156,105 reads)
contigs: /tmp/tmpbvbg0wzi/depths.fasta (0 bp)
alignments: /tmp/tmpbvbg0wzi/depths.paf (0 alignments)
mean depth: 0.000x

Could you explain how could I get a mean depth of 0.000x, I would expect to have at least 2x, from my data.
Is there any parameter I can change during the process to get some gfa, at least the mitochondrial genome?

For instance, I mapped the reads to the c elegans mitogenome, and I got 277 sequences, making about 1.3 MB.
Should I be able to recover the mitochondrial genome in the gfa?

Thanks;

Juan Pablo

The text was updated successfully, but these errors were encountered:

rrwick · 2024-07-25T04:39:21Z

Hi Juan,

Something has clearly gone wrong in during assembly or polishing: the input file to Racon seems to have a total size of 0 bp. I'm not sure where the issue is, but an assembly with a mean depth of 2x is bound to have some problems!

If all you're interested in is the mitochondrial genome, I would suggest trying to assemble just the 277 reads you identified as mitochondrial. This should be very fast and you'll know that it worked well if you get a circular contig of about the right size.

Ryan

desmodus1984 · 2024-07-25T23:04:10Z

Hello Ryan,

I am extremely surprised right now haha.

I was finally able to get all the reads from my small/inefficient Nanopore run through dorado-CPU. I mapped them to the mitochondrial genome of C. elegans, and then I recovered the mapped reads.
I tried the mapping/assembly/polishing twice, once, with all the reads, and then with reads with max-length of 15k. I used this max because the genome of C elegans is 13.7 kb.

I would appreciate if you could give me your comments about my approaches.
I am surprised because the mitochondrial assembly with all the reads (max 44kb), gave me a circular mitochondrial genome (yay! you can imagine how happy I am!) of 10,043 bp and depth of 163x; the one with reads up to 15kb, I got a circular mitochondrial genome (YAY!!!), BUT SIGNIFICANTLY LONGER - 15,959 bp and Depth of 102.5x.
.

Could you find a reason why using longer reads gave such a shorter circular graph?
I would expect that the reads in both cases would support or pinpoint to a mitochondrial genome of size Z; if the reads are weird, I imagine so hard trimming is happening, and edges with low support would be cut, but I don't see why it would end in a shorter graph.

The code I used goes like this:
minimap2 -x ava-ont -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Lig.P-1.mapped.max15k.fastq | gzip -1 > Ju760.Jul25.max15k.paf.gz miniasm -f Ju760.Lig.P-1.mapped.max15k.fastq **./**Ju760.Jul25.max15k.paf.gz > Ju760.Jul25.max15k.gfa minipolish -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Jul25.max15k.gfa > Ju760.Jul25.max15k.polish.gfa
Does it matter the "./" in miniasm?
Without it, It doesn't seem to work.

Looking forward to your comments and suggestions to help me confirm that the 15kb mitochondrial genome I obtained is reliable and I can trust it.

Thank you very much;

Juan Pablo

rrwick · 2024-07-26T01:57:01Z

That's a tough one, and I don't know why you're getting the discrepancy. It will probably take some experimentation and detective work!

Some miscellaneous thoughts:

You're using a C. elegans mito genome to select your reads. Is that a reliable reference? If the C. elegans mito genome is too different from yours, you might be missing some reads.
Try other assemblers, e.g. Canu and Flye on your read sets. See if they tend to produce the 10 kbp version or the 15 kbp version. Can then use Trycycler to build a clean consensus from your alternative assemblies.
Long-read assemblers often struggle with small replicons like this. Assuming that your mito genome isn't full of repeats, you might paradoxically get a better assembly by chopping up your ONT reads into smaller pieces (e.g. 2 kbp) and then assembling.
For each of your assemblies, try aligning your ONT reads and then visualising the alignments in IGV. Sometimes this can help you spot oddities.

desmodus1984 · 2024-07-28T21:48:38Z

Thanks.

Do you know if the polished assembly graph can't be polished a second round?
I got all the read from the other sequencing attemps, and I got some more reads, and I got a bigger asembly, 15976, when the one I got first was 15959. Unless the algorithm is deterministic and not stochastic/random, I should get the same result.
Thus, I got the polish assembly graph: Ju760.Jul28.polish.min500.gfa, and tried again to repolish it:

minipolish -t 12 Ju760.Jul28.mapped.min100.max15.fastq Ju760.Jul28.polish.min500.gfa > Ju760.Jul28.polish.min500.1.gfa

and I got an error:
Checking requirements
Minipolish requires Minimap2 and Racon to run, so it checks for these tools now.

Minimap2 found: /home/juaguila/miniconda3/envs/pomoxis/bin/minimap2 (v2.22-r1101)
Racon found: /home/juaguila/miniconda3/envs/pomoxis/bin/racon (v1.4.20)

Loading graph
Loading the miniasm GFA graph into memory.

Ju760.Jul28.polish.min500.gfa
1 segments (15,976 bp)
2 links

Initial polishing round
The first round of polishing is done on a per-segment basis and only uses reads which are definitely
associated with the segment (because the GFA indicated that they were used to make the segment).

Error: could not find /tmp/tmpabf382_o/utg000001c_reads.fastq

Could you tell why this is happening?

Thanks;

rrwick · 2024-07-29T00:38:12Z

It seems that Minipolish is having issues getting reads from its temporary file. Maybe you don't have read/write permissions for /tmp or something like that? You could try setting your TMPDIR environment variable to somewhere else (where you definitely have read/write permissions).

If that doesn't help, can you upload the reads here or email them to me and I'll see if I can reproduce the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cause of read depth of 0.000x after polishing #13

Cause of read depth of 0.000x after polishing #13

desmodus1984 commented Jul 22, 2024

rrwick commented Jul 25, 2024

desmodus1984 commented Jul 25, 2024

rrwick commented Jul 26, 2024

desmodus1984 commented Jul 28, 2024

rrwick commented Jul 29, 2024

Cause of read depth of 0.000x after polishing #13

Cause of read depth of 0.000x after polishing #13

Comments

desmodus1984 commented Jul 22, 2024

rrwick commented Jul 25, 2024

desmodus1984 commented Jul 25, 2024

rrwick commented Jul 26, 2024

desmodus1984 commented Jul 28, 2024

rrwick commented Jul 29, 2024