Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about the usage of 'split_dis' #6

Open
bitcometz opened this issue Sep 11, 2018 · 4 comments
Open

some questions about the usage of 'split_dis' #6

bitcometz opened this issue Sep 11, 2018 · 4 comments

Comments

@bitcometz
Copy link

Hello,

I think 'split_dis' is a great idea and I want to use it to help me get better genome assembly, for example, to generate more complete repeat region, but I have some question about the usage of 'split_dis':

I know cons.fasta contains the "corrected" fasta. But I am not sure:
the reads of "in.las" or "in.db" refer to "corrected" reads or "raw(uncorrected)" reads?

Thanks!

split_dis
split_dis performs disagreement based read pile splitting for haplotypes and repeats. It expects four arguments

out.las: the name of the output file, which will be written in the LAS file format
cons.fasta: a read consensus file for the reads in the input database in FastA format as produced by daccord
in.las: alignments for in.db as generated by DALIGNER
in.db: input read database

@bitcometz
Copy link
Author

Hello,

And I used the daccord corrected reads as input: in.fasta to run daligner to generate
raw_data.1.las
and use this alignment to run with daccord to generate preads.1.fasta
Then I try to run split_dis:

bin/split_dis -t5 -d30 -D200 out1.las ./preads.1.fasta ./raw_data.1.las ./raw_data.dam

and stop in the log file like this and it continue to run without outputting more information:

DC[186]=0 1333
DC[187]=0 1270
DC[188]=0 1270
DC[189]=0 1331
DC[190]=0 9393
DC[191]=0 1270
DC[192]=0 9094
DC[193]=0 1413
DC[194]=0 8424
DC[195]=0 6801
DC[196]=0 1331
DC[197]=0 1283
DC[198]=0 1331
[V] keep 9 106;120;146;150;188;230;239;315;321;340;370;444;447;459;465;511;574;575;636;654;762;763;767;791;798;881;905;910;913;969;983;
[V] drop 9
[V] read id 9 time 77478400

@bitcometz
Copy link
Author

And I try this:
bin/computeextrinsicqv in.fasta preads.1.fasta raw_data.dam 1

the memory surge up to 100G. It was hard to believe because the fasta is only 33Mbp.

@gt1
Copy link
Owner

gt1 commented Sep 12, 2018

Hi,

concerning computeextrinsicqv: I forgot to update the documentation concerning the arguments expected. This is updated now, please try again with the most recent README.md hints.

about split_dis: this is so far mainly a proof of concept program. While it works as a general idea, it does not really cope yet with high depth or a large number of repeat instances. I have used it to separate up to 7 or 8 copies at depth 20 each, but this is already pretty slow. Anything more will probably take forever with the current implementation.

Best,
German

@bitcometz
Copy link
Author

Thanks very much for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants