Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] plot a random subsample with 'sourmash plot --subsample'. #343

Merged
merged 7 commits into from
Sep 29, 2017

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Sep 28, 2017

This adds --subsample <N> and --subsample-seed <R> to sourmash plot, which will plot a randomly chosen subset of size N, chosen using Python's random.shuffle, seeded with --subsample-seed. Note that the seed defaults to 1, which intentionally gives stable results when used with the same inputs.

Fixes #221.

Also fixes #334, detecting multiple ksizes/moltypes earlier.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@ctb
Copy link
Contributor Author

ctb commented Sep 28, 2017

@taylorreiter comments & review welcome!

@ctb ctb changed the title [MRG] plot a random subsample with 'sourmash plot --subsample'. [WIP] plot a random subsample with 'sourmash plot --subsample'. Sep 28, 2017
@ctb ctb mentioned this pull request Sep 28, 2017
@ctb
Copy link
Contributor Author

ctb commented Sep 29, 2017

Note, had to fix the version of the random number seed for version compat - see https://stackoverflow.com/questions/11929701/why-is-seeding-the-random-generator-not-stable-between-versions-of-python

@codecov-io
Copy link

codecov-io commented Sep 29, 2017

Codecov Report

Merging #343 into master will increase coverage by 0.02%.
The diff coverage is 92.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #343      +/-   ##
==========================================
+ Coverage   86.96%   86.99%   +0.02%     
==========================================
  Files          13       13              
  Lines        2018     2037      +19     
  Branches       36       36              
==========================================
+ Hits         1755     1772      +17     
- Misses        262      264       +2     
  Partials        1        1
Impacted Files Coverage Δ
sourmash_lib/commands.py 90.21% <92.59%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 069fb51...8faa1b2. Read the comment docs.

@taylorreiter
Copy link
Contributor

@ctb My use case was wanting to visualize divergent samples in a group of 11,000 samples that should have had similar tetranucleotide frequency throughout. I had no hypothesis as to which samples would not have similar tetranucleotide frequency, but suspected there would be some.

This provides the first step. If a random subsample is selected, I would expect that some of the time a non-similar sample would be plotted. The next interesting step I think would be to select the N most similar samples to sample X, and plot these from a sourmash compare matrix. I could do this in R pretty easily.

Ran the code and liked the output!
test_plot_random.pdf

Copy link
Contributor

@taylorreiter taylorreiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very quick! Nice tool!

@ctb ctb changed the title [WIP] plot a random subsample with 'sourmash plot --subsample'. [MRG] plot a random subsample with 'sourmash plot --subsample'. Sep 29, 2017
@ctb
Copy link
Contributor Author

ctb commented Sep 29, 2017

Ready for review & merge, @betatim @luizirber !

@ctb
Copy link
Contributor Author

ctb commented Sep 29, 2017

& thanks for trying it out, @taylorreiter :)

@ctb
Copy link
Contributor Author

ctb commented Sep 29, 2017

whups! I see @taylorreiter has already approved it, so I'll merge when the tests pass :)

@ctb ctb merged commit d9ce80f into master Sep 29, 2017
@ctb ctb deleted the plot/random branch September 29, 2017 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sourmash compare should complain earlier if there are multiple ksizes sourmash plot slices
3 participants