Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourmash compare matrix plot matplotlib labels too large/overlapping #2587

Open
peterjc opened this issue Apr 24, 2023 · 5 comments
Open

sourmash compare matrix plot matplotlib labels too large/overlapping #2587

peterjc opened this issue Apr 24, 2023 · 5 comments

Comments

@peterjc
Copy link
Contributor

peterjc commented Apr 24, 2023

Running sourmash plot --pdf --labels example.npy with ~200 signatures gives plots where the labels are too large and therefore overlap.

Looking at https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/fig.py it does not appear to alter the matplotlib default font sizes, but resources like https://stackoverflow.com/questions/3899980/how-to-change-the-font-size-on-a-matplotlib-plot suggests we might reduce the font size and/or increase the image size for larger datasets.

Is this a bug, or would your recommendation be to follow https://sourmash.readthedocs.io/en/latest/plotting-compare.html#Customizing-plots and customise the plot by writing a modified version of the sourmash/fig.py code?

@ctb
Copy link
Contributor

ctb commented Apr 24, 2023

sourmash plot could certainly use some love! It was one of the first things we implemented ~6 years ago, and (FBFW) has driven a lot of our citations... but we haven't upgraded it, ever. This was due to some combination of:

  • I'm not a plot-focused person, and my general feeling has been that we should provide the raw data in convenient formats to support other people doing custom things with it.
  • the most plot focused people on the sourmash team tend to be R programmers ;)
  • the slow but progressive addition of functionality that supported many more sketches, more types of comparisons, and much better naming/renaming of sketches.

This is all me saying that it's never risen to the level of "gotta fix" but has definitely risen to the level of "hmmmm yeah we should really be doing something about that."

A few related thoughts and issues -

the R package, sourmashconsumr

sourmashconsumr #2492 is an R package that has some nice viz:

Screenshot 2023-04-24 at 6 16 23 AM

sourmash plot isn't doing the right thing, I think

per #2406, I appear to have mixed up my similarity and distance matrices.

better label handling, plot annotation, etc

per #2452, there are some good opportunities to make editing label names better (since I intuit that is a lot of what people want to do)

per #2583 there are lots of opportunities to annotate dendrograms with more information

plugins are now a thing

per #1353 and #2438 in particular it would now be straightforward to experiment with other clustering and viz techniques all from within the relative safety of the sourmash command line.

this would permit the addition of dependencies that we don't want to add to core sourmash (for size and/or platform/install and/or support reasons) to support better output viz.


this is all to say... we just need someone who cares, or at least pointers to some good plots from other packages that we can steal ;). I know this is an active area, I just don't have a starting point!

@peterjc
Copy link
Contributor Author

peterjc commented Apr 24, 2023

That all makes sense. One size fits all visualisation defaults are not easy.

@ctb
Copy link
Contributor

ctb commented Apr 25, 2023

additional thoughts -

  • can easily make binders with R and Python scripts/notebooks that show loading & viz code and permit further customization
  • also at the very least we can provide loading code that shows how this ties into viz examples
  • might make sense to create examples/good default viz for ~10 genomes, ~100 genomes, and ~1000 genomes

@ctb
Copy link
Contributor

ctb commented Apr 26, 2023

more from slack:

Christopher Gulvik
Fig 1c minimum spanning tree style in GrapeTree rocks by
[@jcarrico]
and
[@happykhan]
. I've grown to appreciate it more and more for a broader audience than heirclust or phytrees to show outbreak or cluster data (SNPs, ANI, or cgMLST). The software that currently makes that style here has end of life this year.

Screenshot 2023-04-26 at 5 59 05 AM

@ctb
Copy link
Contributor

ctb commented May 18, 2024

The betterplot plugin would be a good place to add custom plotting code for very large plots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants