some questions about the usage of 'split_dis' #6

bitcometz · 2018-09-11T02:44:25Z

Hello,

I think 'split_dis' is a great idea and I want to use it to help me get better genome assembly, for example, to generate more complete repeat region, but I have some question about the usage of 'split_dis':

I know cons.fasta contains the "corrected" fasta. But I am not sure:
the reads of "in.las" or "in.db" refer to "corrected" reads or "raw(uncorrected)" reads?

Thanks!

split_dis
split_dis performs disagreement based read pile splitting for haplotypes and repeats. It expects four arguments

out.las: the name of the output file, which will be written in the LAS file format
cons.fasta: a read consensus file for the reads in the input database in FastA format as produced by daccord
in.las: alignments for in.db as generated by DALIGNER
in.db: input read database

bitcometz · 2018-09-11T07:49:12Z

Hello,

And I used the daccord corrected reads as input: in.fasta to run daligner to generate
raw_data.1.las
and use this alignment to run with daccord to generate preads.1.fasta
Then I try to run split_dis:

bin/split_dis -t5 -d30 -D200 out1.las ./preads.1.fasta ./raw_data.1.las ./raw_data.dam

and stop in the log file like this and it continue to run without outputting more information:

DC[186]=0 1333
DC[187]=0 1270
DC[188]=0 1270
DC[189]=0 1331
DC[190]=0 9393
DC[191]=0 1270
DC[192]=0 9094
DC[193]=0 1413
DC[194]=0 8424
DC[195]=0 6801
DC[196]=0 1331
DC[197]=0 1283
DC[198]=0 1331
[V] keep 9 106;120;146;150;188;230;239;315;321;340;370;444;447;459;465;511;574;575;636;654;762;763;767;791;798;881;905;910;913;969;983;
[V] drop 9
[V] read id 9 time 77478400

bitcometz · 2018-09-11T08:47:11Z

And I try this:
bin/computeextrinsicqv in.fasta preads.1.fasta raw_data.dam 1

the memory surge up to 100G. It was hard to believe because the fasta is only 33Mbp.

gt1 · 2018-09-12T12:04:29Z

Hi,

concerning computeextrinsicqv: I forgot to update the documentation concerning the arguments expected. This is updated now, please try again with the most recent README.md hints.

about split_dis: this is so far mainly a proof of concept program. While it works as a general idea, it does not really cope yet with high depth or a large number of repeat instances. I have used it to separate up to 7 or 8 copies at depth 20 each, but this is already pretty slow. Anything more will probably take forever with the current implementation.

Best,
German

bitcometz · 2018-09-13T08:44:57Z

Thanks very much for your help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some questions about the usage of 'split_dis' #6

some questions about the usage of 'split_dis' #6

bitcometz commented Sep 11, 2018

bitcometz commented Sep 11, 2018

bitcometz commented Sep 11, 2018

gt1 commented Sep 12, 2018

bitcometz commented Sep 13, 2018

some questions about the usage of 'split_dis' #6

some questions about the usage of 'split_dis' #6

Comments

bitcometz commented Sep 11, 2018

bitcometz commented Sep 11, 2018

bitcometz commented Sep 11, 2018

gt1 commented Sep 12, 2018

bitcometz commented Sep 13, 2018