-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sc.metrics
module (add confusion matrix & Geary's C methods)
#915
Conversation
The confusion matrix metric looks very similar to what |
Nice! Can you please explain your rationale for why they shouldn’t a) be normal tools and b) saved into the AnnData object?
Sure, adding more and more features is a good point to think about the API, I’d just like to hear why all current analysis tools belong into |
There's definitely some similarities, especially for the Why As to not modifying the AnnData object, It's not obvious to me how these functions would modify the object in a helpful way. I don't expect the outputs of these functions to have a similar shape to the AnnData object particularly often. Admittedly, the example I included has that, but I'm not sure if that'll be a common use case. You could whack it in I guess I'd summarize this as: I don't see added value from adding the results to the object – but I do see more complexity in implementing that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Love your clean implementations.
scanpy/metrics/_gearys_c.py
Outdated
# TODO: Document better | ||
# TODO: Have scanpydoc work with multipledispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to scanpydoc are online fast, do you have something in line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty open to suggestions on this.
We could do something similar to R? There they give all the signatures and then have all the arguments for all of them mixed together.
Is it possible to have multiple Params
sections in sphinx? Otherwise, maybe we find functions which have had multiple dispatch applied, then know to stitch their doc strings together somehow?
I feel like languages that use dispatch more heavily (and not like object.method
) tend to have more narrative docs with examples of usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the current output? Do we at least have the docstring and signature of one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?sc.metrics.gearys_c
looks like:
Dispatched docstring
Signature: sc.metrics.gearys_c(*args, **kwargs)
Type: Dispatcher
String form: <dispatched gearys_c>
File: /usr/local/lib/python3.7/site-packages/multipledispatch/dispatcher.py
Docstring:
Multiply dispatched method: gearys_c
Inputs: <AnnData>
------------------
Calculate `Geary's C` <https://en.wikipedia.org/wiki/Geary's_C>`_, as used
by `VISION <https://doi.org/10.1038/s41467-019-12235-0>`_.
Geary's C is a measure of autocorrelation for some measure on a graph. This
can be to whether measures are correlated between neighboring cells. Lower
values indicate greater correlation.
..math
C =
rac{
(N - 1)\sum_{i,j} w_{i,j} (x_i - x_j)^2
}{
2W \sum_i (x_i - �ar{x})^2
}
Params
------
adata
vals
Values to calculate Geary's C for. If this is two dimensional, should
be of shape `(n_features, n_cells)`. Otherwise should be of shape
`(n_cells,)`. This matrix can be selected from elements of the anndata
object by using key word arguments: `layer`, `obsm`, `obsp`, or
`use_raw`.
use_graph
Key to use for graph in anndata object. If not provided, default
neighbors connectivities will be used instead.
layer
Key for `adata.layers` to choose `vals`.
obsm
Key for `adata.obsm` to choose `vals`.
obsp
Key for `adata.obsp` to choose `vals`.
use_raw
Whether to use `adata.raw.X` for `vals`.
Returns
-------
If vals is two dimensional, returns a 1 dimensional ndarray array. Returns
a scalar if `vals` is 1d.
Other signatures:
spmatrix, spmatrix
csr_matrix, csr_matrix
spmatrix, ndarray
spmatrix, DataFrame
spmatrix, Series
So not that much of a signature. I added a doc-string to one of the other methods and got this:
Two doc-strings
Signature: sc.metrics.gearys_c(*args, **kwargs)
Type: Dispatcher
String form: <dispatched gearys_c>
File: /usr/local/lib/python3.7/site-packages/multipledispatch/dispatcher.py
Docstring:
Multiply dispatched method: gearys_c
Inputs: <spmatrix, ndarray>
----------------------------
Params
------
g
Connectivity graph as a scipy sparse matrix. Should have shape:
`(n_obs, n_obs)`.
vals
Values to calculate Geary's C for. If one dimensional, should have
shape `(n_obs,)`. If two dimensional (i.e calculating Geary's C for
multiple variables) should have shape `(n_vars, n_obs)`.
Inputs: <AnnData>
------------------
Calculate `Geary's C` <https://en.wikipedia.org/wiki/Geary's_C>`_, as used
by `VISION <https://doi.org/10.1038/s41467-019-12235-0>`_.
Geary's C is a measure of autocorrelation for some measure on a graph. This
can be to whether measures are correlated between neighboring cells. Lower
values indicate greater correlation.
..math
C =
rac{
(N - 1)\sum_{i,j} w_{i,j} (x_i - x_j)^2
}{
2W \sum_i (x_i - �ar{x})^2
}
Params
------
adata
vals
Values to calculate Geary's C for. If this is two dimensional, should
be of shape `(n_features, n_cells)`. Otherwise should be of shape
`(n_cells,)`. This matrix can be selected from elements of the anndata
object by using key word arguments: `layer`, `obsm`, `obsp`, or
`use_raw`.
use_graph
Key to use for graph in anndata object. If not provided, default
neighbors connectivities will be used instead.
layer
Key for `adata.layers` to choose `vals`.
obsm
Key for `adata.obsm` to choose `vals`.
obsp
Key for `adata.obsp` to choose `vals`.
use_raw
Whether to use `adata.raw.X` for `vals`.
Returns
-------
If vals is two dimensional, returns a 1 dimensional ndarray array. Returns
a scalar if `vals` is 1d.
Other signatures:
spmatrix, spmatrix
csr_matrix, csr_matrix
spmatrix, DataFrame
spmatrix, Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scanpydoc throws an error if I try to add this to the docs. The full log is below, but I think these are the relevant parts of the traceback:
File "/usr/local/lib/python3.7/site-packages/sphinx/events.py", line 103, in emit
results.append(callback(self.app, *args))
File "/usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/return_tuple.py", line 46, in process_docstring
ret_types = get_tuple_annot(get_type_hints(obj).get("return"))
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/typing.py", line 996, in get_type_hints
'or function.'.format(obj))
TypeError: <dispatched gearys_c> is not a module, class, method, or function.
Full error
# Sphinx version: 2.2.1
# Python version: 3.7.5 (CPython)
# Docutils version: 0.15.2 release
# Jinja2 version: 2.10.1
# Last messages:
# loading pickled environment...
# done
# [autosummary] generating autosummary for: api/index.rst, api/plotting.rst, api/scanpy.Neighbors.compute_eigen.rst, api/scanpy.Neighbors.compute_neighbors.rst, api/scanpy.Neighbors.compute_transitions.rst, api/scanpy.Neighbors.connectivities.rst, api/scanpy.Neighbors.distances.rst, api/scanpy.Neighbors.distances_dpt.rst, api/scanpy.Neighbors.eigen_basis.rst, api/scanpy.Neighbors.eigen_values.rst, ..., external/scanpy.external.tl.cyclone.rst, external/scanpy.external.tl.palantir.rst, external/scanpy.external.tl.phate.rst, external/scanpy.external.tl.phenograph.rst, external/scanpy.external.tl.sandbag.rst, index.rst, installation.rst, references.rst, release_notes.rst, tutorials.rst
# [autosummary] generating autosummary for: /Users/isaac/github/scanpy/docs/api/scanpy.metrics.confusion_matrix.rst, /Users/isaac/github/scanpy/docs/api/scanpy.metrics.gearys_c.rst
# building [mo]: targets for 0 po files that are out of date
# building [html]: targets for 6 source files that are out of date
# updating environment:
# [extensions changed ('5')]
# 242 added, 0 changed, 0 removed
# reading sources... [ 0%] api/index
# Loaded extensions:
# sphinx.ext.mathjax (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/mathjax.py
# sphinxcontrib.applehelp (1.0.1) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/applehelp/__init__.py
# sphinxcontrib.devhelp (1.0.1) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/devhelp/__init__.py
# sphinxcontrib.htmlhelp (1.0.2) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/htmlhelp/__init__.py
# sphinxcontrib.serializinghtml (1.1.3) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/serializinghtml/__init__.py
# sphinxcontrib.qthelp (1.0.2) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/qthelp/__init__.py
# alabaster (0.7.12) from /usr/local/lib/python3.7/site-packages/alabaster/__init__.py
# sphinx.ext.autodoc (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py
# sphinx.ext.intersphinx (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/intersphinx.py
# sphinx.ext.doctest (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/doctest.py
# sphinx.ext.coverage (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/coverage.py
# sphinx.ext.napoleon (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/napoleon/__init__.py
# sphinx.ext.autosummary (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py
# sphinx_autodoc_typehints (unknown version) from /usr/local/lib/python3.7/site-packages/sphinx_autodoc_typehints.py
# scanpydoc.autosummary_generate_imported (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/autosummary_generate_imported.py
# scanpydoc.definition_list_typed_field (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/definition_list_typed_field.py
# scanpydoc.elegant_typehints (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/__init__.py
# scanpydoc.rtd_github_links (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/rtd_github_links.py
# scanpydoc (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/__init__.py
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sphinx/cmd/build.py", line 276, in build_main
app.build(args.force_all, filenames)
File "/usr/local/lib/python3.7/site-packages/sphinx/application.py", line 346, in build
self.builder.build_update()
File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 299, in build_update
len(to_build))
File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 311, in build
updated_docnames = set(self.read())
File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 418, in read
self._read_serial(docnames)
File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 439, in _read_serial
self.read_doc(docname)
File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 479, in read_doc
doctree = read_doc(self.app, self.env, self.env.doc2path(docname))
File "/usr/local/lib/python3.7/site-packages/sphinx/io.py", line 326, in read_doc
pub.publish()
File "/usr/local/lib/python3.7/site-packages/docutils/core.py", line 217, in publish
self.settings)
File "/usr/local/lib/python3.7/site-packages/sphinx/io.py", line 114, in read
self.parse()
File "/usr/local/lib/python3.7/site-packages/docutils/readers/__init__.py", line 77, in parse
self.parser.parse(self.input, document)
File "/usr/local/lib/python3.7/site-packages/sphinx/parsers.py", line 95, in parse
self.statemachine.run(inputlines, document, inliner=self.inliner)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 171, in run
input_source=document['source'])
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2771, in underline
self.section(title, source, style, lineno - 1, messages)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 327, in section
self.new_subsection(title, lineno, messages)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
node=section_node, match_titles=True)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2771, in underline
self.section(title, source, style, lineno - 1, messages)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 327, in section
self.new_subsection(title, lineno, messages)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
node=section_node, match_titles=True)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2344, in explicit_markup
nodelist, blank_finish = self.explicit_construct(match)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2356, in explicit_construct
return method(self, expmatch)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2099, in directive
directive_class, match, type_name, option_presets)
File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2148, in run_directive
result = directive_instance.run()
File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py", line 240, in run
items = self.get_items(names)
File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py", line 349, in get_items
documenter.add_content(None)
File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py", line 471, in add_content
for i, line in enumerate(self.process_doc(docstrings)):
File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py", line 441, in process_doc
self.options, docstringlines)
File "/usr/local/lib/python3.7/site-packages/sphinx/application.py", line 450, in emit
return self.events.emit(event, *args)
File "/usr/local/lib/python3.7/site-packages/sphinx/events.py", line 103, in emit
results.append(callback(self.app, *args))
File "/usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/return_tuple.py", line 46, in process_docstring
ret_types = get_tuple_annot(get_type_hints(obj).get("return"))
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/typing.py", line 996, in get_type_hints
'or function.'.format(obj))
TypeError: <dispatched gearys_c> is not a module, class, method, or function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t really se the signature because of the corrupted ANSI escape sequences. can you copy it from a terminal that lets you copy plain text?
the fix in scanpydoc seems easy!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the header for the docstrings in the examples gave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that looks good! Can you test it with theislab/scanpydoc#12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've responded on that PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great! TODO:
- get multipledispatch support into scanpydoc
- add changelog entry
Would you mind handling that? I have zero experience with sphinx extensions. |
Also two api thoughts: For Second, for
Right now it has the seaborn style argument handling shown at the top of this PR. I'm not sure that's really caught on in other packages or fits with scanpy. |
Do you reckon it makes sense to make |
Nah, I think it'd end up being at least the same amount of code converting values into different types. |
any updates on this PR? @ivirshup I know some people in the lab really like your Geary's C implementation. Any thoughts on making it a small standalone package? Also I do like the idea of the |
@adamgayoso, I should definitely get around to merging this. I think I can pretty much do it as is, and open a second issue for getting the docs looking good. I'd like to target an initial Question for your lab, are our implementations equivalent? I haven't actually gotten around to testing against the |
I'd have to check with @deto or @mattjones315 about equivalency. |
also as an aside, would it be appropriate to include some of @LuckyMD scIB integration metrics here? It would give people easier access and probably expand general use. |
At the moment we're trying to clean up That being said, maybe we could think about an optional requirement for scIB to integrate them? At least when we've cleaned up our side of things. |
@LuckyMD, I've actually been meaning to ask you about this. I'd like to include at least some of those metrics here. In some cases it mean making numba implementations (which would ultimately be easier to distribute). There would definitely be a prioritization for general usefulness of the metric. What do you think about that? |
A numba reimplementation of some of the metrics sounds pretty awesome actually. That's out of scope for |
just want to say that even a first release with some of the easiest to implement metrics could help lead to greater widespread use and IMO would generally be appreciated by the community. Besides the fact that it seems like a perfect fit for this scanpy module as I understand it. Though I do understand the citation issue. Maybe it's time for a global citation table and each function can add to the table if there is an appropriate citation?! Maybe it could be accessed with |
So, question from a user stand point: Is it worth it for us to include the really really easy to implement metrics? The ones where we'd basically just be wrapping scikit-learn? I think this fits with the idea of
It's definitely good to have a citation in the docstring for each function. For the docs of the metrics module, I think there would be a subsection for "Integration metrics" which could definitely point to
Are you suggesting that the table would be added to at runtime (when a function is called)? I think this may be better addressed by a broader solution to "what has been done to this dataset?". I'm not sure how this could be done without buy in from third party libraries. Also has been discussed a bit previously: #472. |
Yeah I suppose, though I could see how this gets complicated by the fact that I imagine more people than myself use multiple h5ads throughout their analysis of the same dataset. Though maybe something simpler is to be able to access a global table of functions and citations, and it gives you the bibtex. But if there's any interest in it (a bit of a weird idea I understand) I can make an issue for it. |
@adamgayoso From my point of view, references are already available (source, rendered) and linked to in the documentation. I'm not against the idea, I'm just not seeing how it makes this information more accessible/ prominent. A separate issue for the topic would be good for more discussion.
@LuckyMD, meant to say, |
Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it.
* Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability.
I think this is currently bugged by: numba/numba#6774. It's a weird bug: some code just doesn't execute, unless I swap out a It looks like this won't be solved by the next numba release, so working around it will be necessary for timely inclusion. |
sc.metrics
module (add confusion matrix & Geary's C methods)sc.metrics
module (add confusion matrix & Geary's C methods)
how'd this get in here
This reverts commit d2a8328.
* Add `sc.metrics` with `gearys_c` Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it. * Add sc.metrics.confusion_matrix * Better tests and output for confusion_matrix * Workaround umap<0.4 and increase numerical stability of gearys_c * Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability. * Split up gearys_c tests * Improved unexpected error message * gearys_c working again. Sadly, a bit slower * One option for doc strings * Simplify implementation to use single dispatch * release notes
* add flake8 pre-commit Signed-off-by: Zethson <[email protected]> * fix pre-commit Signed-off-by: Zethson <[email protected]> * add E402 to flake8 ignore Signed-off-by: Zethson <[email protected]> * revert neighbors Signed-off-by: Zethson <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * address review Signed-off-by: Zethson <[email protected]> * fix comment character in .flake8 Signed-off-by: Zethson <[email protected]> * fix test Signed-off-by: Zethson <[email protected]> * black Signed-off-by: Zethson <[email protected]> * review round 2 Signed-off-by: Zethson <[email protected]> * review round 3 Signed-off-by: Zethson <[email protected]> * readded double comments Signed-off-by: Zethson <[email protected]> * Ignoring E262 & reverted comment Signed-off-by: Zethson <[email protected]> * using self for obs_tidy Signed-off-by: Zethson <[email protected]> * Restore setup.py * rm call of black test (#1690) * Fix print_versions for python<3.8 (#1691) * add codecov so we can have a badge to point to (#1693) * Attempt server-side search (#1672) * Fix paga_path (#1047) Fix paga_path Co-authored-by: Isaac Virshup <[email protected]> * Switch to flit This reverts commit d645790 * add setup.py while leaving it ignored * Update install instructions * Circumvent new pip check (see pypa/pip#9628) * Go back to regular pip (#1702) * Go back to regular flit Co-authored-by: Isaac Virshup <[email protected]> * codecov comment (#1704) * Use joblib for parallelism in regress_out (#1695) * Use joblib for parallism in regress_out * release note * fix link in release notes * Add todo for resource test * Add sparsificiation step before sparse-dependent Scrublet calls (#1707) * Add sparsificiation step before sparse-dependent Scrublet calls * Apply sparsification suggestion Co-authored-by: Isaac Virshup <[email protected]> * Fix imports Co-authored-by: Isaac Virshup <[email protected]> * Fix version on Travis (#1713) By default, Travis does `git clone --depth=50` which means the version can’t be detected from the git tag. * `sc.metrics` module (add confusion matrix & Geary's C methods) (#915) * Add `sc.metrics` with `gearys_c` Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it. * Add sc.metrics.confusion_matrix * Better tests and output for confusion_matrix * Workaround umap<0.4 and increase numerical stability of gearys_c * Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability. * Split up gearys_c tests * Improved unexpected error message * gearys_c working again. Sadly, a bit slower * One option for doc strings * Simplify implementation to use single dispatch * release notes * Fix clipped images in docs (#1717) * Cleanup normalize_total (#1667) * Cleanup normalize_total * Add modification tests and copy kwarg for normalize_total * Test that 'layers' argument is deprecated * Added more mutation checks for normalize_total * release note * Error message * deprecate scvi (#1703) * deprecate scvi * Update .azure-pipelines.yml Co-authored-by: Isaac Virshup <[email protected]> * remove :func: links to scvi in release notes * remove tildes in front of scvi in release notes * Update docs/release-notes/1.5.0.rst Co-authored-by: Michael Jayasuriya <[email protected]> Co-authored-by: Isaac Virshup <[email protected]> * updated ecosystem.rst to add triku (#1722) * Minor addition to contributing docs (#1726) * Preserve category order when groupby is a list (#1735) Preserve category order when groupby is a list * Asymmetrical diverging colormaps and vcenter (#1551) Add vcenter and norm arguments to plotting functions * add flake8 pre-commit Signed-off-by: Zethson <[email protected]> * add E402 to flake8 ignore Signed-off-by: Zethson <[email protected]> * revert neighbors Signed-off-by: Zethson <[email protected]> * address review Signed-off-by: Zethson <[email protected]> * black Signed-off-by: Zethson <[email protected]> * using self for obs_tidy Signed-off-by: Zethson <[email protected]> * rebased Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * add flake8 to dev docs Signed-off-by: Zethson <[email protected]> * add autopep8 to pre-commits Signed-off-by: Zethson <[email protected]> * add flake8 ignore docs Signed-off-by: Zethson <[email protected]> * add exception todos Signed-off-by: Zethson <[email protected]> * add ignore directories Signed-off-by: Zethson <[email protected]> * reinstated lambdas Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * Add E741 to allowed flake8 violations. Co-authored-by: Isaac Virshup <[email protected]> * Add F811 flake8 ignore for tests Co-authored-by: Isaac Virshup <[email protected]> * Fix mask comparison Co-authored-by: Isaac Virshup <[email protected]> * Fix mask comparison Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 config file Signed-off-by: Zethson <[email protected]> * readded autopep8 Signed-off-by: Zethson <[email protected]> * import Literal Signed-off-by: Zethson <[email protected]> * revert literal import Signed-off-by: Zethson <[email protected]> * fix scatterplot pca import Signed-off-by: Zethson <[email protected]> * false comparison & unused vars Signed-off-by: Zethson <[email protected]> * Add cleaner level determination Co-authored-by: Isaac Virshup <[email protected]> * Fix comment formatting Co-authored-by: Isaac Virshup <[email protected]> * Add smoother dev documentation Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * Readd long comment Co-authored-by: Isaac Virshup <[email protected]> * Assuming X as array like Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * fix flake8 config Signed-off-by: Zethson <[email protected]> * reverted rank_genes Signed-off-by: Zethson <[email protected]> * fix disp_mean_bin formatting Co-authored-by: Isaac Virshup <[email protected]> * fix formatting Signed-off-by: Zethson <[email protected]> * add final todos Signed-off-by: Zethson <[email protected]> * boolean checks with is Signed-off-by: Zethson <[email protected]> * _dpt formatting Signed-off-by: Zethson <[email protected]> * literal fixes Signed-off-by: Zethson <[email protected]> * links to leafs Signed-off-by: Zethson <[email protected]> * revert paga variable naming Co-authored-by: Philipp A <[email protected]> Co-authored-by: Sergei Rybakov <[email protected]> Co-authored-by: Isaac Virshup <[email protected]> Co-authored-by: Jonathan Manning <[email protected]> Co-authored-by: mjayasur <[email protected]> Co-authored-by: Michael Jayasuriya <[email protected]> Co-authored-by: Alex M. Ascensión <[email protected]> Co-authored-by: Gökçen Eraslan <[email protected]>
This PR adds a module
sc.metrics
for functions which wouldn't modify an anndata object, but are useful calculations. I'm basing this onsklearn.metrics
, namely, howsklearn
has separated transformers (sc.tl
) from measurements. I've started it with two functions,confusion_matrix
andgearys_c
but think there are more use cases (e.g.modularity
). I'm open to this not being a module, but I think these methods should be available and I'm not sure where they'd fit within the current api.My vision for this module is to make it easier to calculate values based on values you'd get using the scanpy ecosystem. Methods that would be included would be either a) not available in other libraries (
gearys_c
) or b) are available, but have difficult interfaces (confusion_matrix
).sc.metrics.confusion_matrix
Creates a confusion matrix for comparing categorical labels. This is based on
sklearn.metrics.confusion_matrix
but is easier to use, and returns a object with labels. I think this is mostly done, though I'm considering changing the calling convention. Here's an example of usage:I've copied
seaborn
s calling convention here, but I think that could change. Right now the above call is equivalent to:But I wonder if it would make more sense to have the DataFrame go first if it's provided. I've also based the API around my usage of confusion matrices, so I'm very open to more general feedback on this. My reason for including it here was the amount of code it took wrapping
sklearn.metrics.confusion_matrix
to get useful output.sc.metrics.gearys_c
(Wiki page)Calculates autocorrelation on a measure on a network. Used in VISION for ranking gene sets. This is useful for finding out whether some per-cell measure is correlated with the structure of a connectivity graph. In practice, I've found it useful for identifying features that look good on a UMAP:
It can also be useful to rank components of dimensionality reductions (example with ICA: #767 (comment)).