Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to sketch 1,000 files in one line #71

Closed
ZeweiSong opened this issue Dec 15, 2017 · 3 comments
Closed

Is there a way to sketch 1,000 files in one line #71

ZeweiSong opened this issue Dec 15, 2017 · 3 comments

Comments

@ZeweiSong
Copy link

Hi,

I was wondering if there is an option that I can sketch many files in one command, I would also like to get all pairwise distances.

Thanks!

Zewei

@ondovb
Copy link
Member

ondovb commented Dec 15, 2017

Hi Zewei,

Not built into Mash, but you can use some shell trickery (see #26):
echo a b c | xargs -n 1 mash sketch.

For the distances, you would then need to paste them and give the result to dist twice (as ref and query). There has been some third-party effort to streamline this process since we haven't gotten to it (see https://github.com/lskatz/mashtree, #9, #66).

@ZeweiSong
Copy link
Author

ZeweiSong commented Dec 19, 2017

Thanks, I got it like this:

Create sketch for all sequences (into a single file)

mash sketch *.fa -o genomes

Compare individual sequence file with the sketch, and saved into a single dist file

echo *.fa | xargs -n 1 mash dist genomes.msh > dist.txt

I then just parser the txt file in python to keep the non-redundant pairs.

@alienzj
Copy link

alienzj commented Apr 3, 2018

For the second step:
mash dist genomes.msh genomes.msh > dist.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants