`sc.metrics` module (add confusion matrix & Geary's C methods) #915

ivirshup · 2019-11-13T06:33:54Z

This PR adds a module sc.metrics for functions which wouldn't modify an anndata object, but are useful calculations. I'm basing this on sklearn.metrics, namely, how sklearn has separated transformers (sc.tl) from measurements. I've started it with two functions, confusion_matrix and gearys_c but think there are more use cases (e.g. modularity). I'm open to this not being a module, but I think these methods should be available and I'm not sure where they'd fit within the current api.

My vision for this module is to make it easier to calculate values based on values you'd get using the scanpy ecosystem. Methods that would be included would be either a) not available in other libraries (gearys_c) or b) are available, but have difficult interfaces (confusion_matrix).

`sc.metrics.confusion_matrix`

Creates a confusion matrix for comparing categorical labels. This is based on sklearn.metrics.confusion_matrix but is easier to use, and returns a object with labels. I think this is mostly done, though I'm considering changing the calling convention. Here's an example of usage:

import scanpy as sc
import seaborn as sns

pbmc = sc.datasets.pbmc68k_reduced()
sc.tl.leiden(pbmc)
sns.heatmap(sc.metrics.confusion_matrix("bulk_labels", "leiden", pbmc.obs))

I've copied seaborns calling convention here, but I think that could change. Right now the above call is equivalent to:

sc.metrics.confusion_matrix(pbmc.obs["bulk_labels"], pbmc.obs["louvain"])

But I wonder if it would make more sense to have the DataFrame go first if it's provided. I've also based the API around my usage of confusion matrices, so I'm very open to more general feedback on this. My reason for including it here was the amount of code it took wrapping sklearn.metrics.confusion_matrix to get useful output.

`sc.metrics.gearys_c` (Wiki page)

Calculates autocorrelation on a measure on a network. Used in VISION for ranking gene sets. This is useful for finding out whether some per-cell measure is correlated with the structure of a connectivity graph. In practice, I've found it useful for identifying features that look good on a UMAP:

import numpy as np
pbmc.layers["logcounts"] = pbmc.raw.X

%time gearys_c = sc.metrics.gearys_c(pbmc, layer="logcounts")
# CPU times: user 496 ms, sys: 3.88 ms, total: 500 ms
# Wall time: 74.9 ms
to_plot = pbmc.var_names[np.argsort(gearys_c)[:4]]
sc.pl.umap(pbmc, color=to_plot, ncols=2)

It can also be useful to rank components of dimensionality reductions (example with ICA: #767 (comment)).

LuckyMD · 2019-11-13T08:48:07Z

The confusion matrix metric looks very similar to what sc.marker_genes_overlap() does. It might be worth moving that function over as well and potentially merging the functions. What do you think?

flying-sheep · 2019-11-13T09:44:36Z

Nice! Can you please explain your rationale for why they shouldn’t a) be normal tools and b) saved into the AnnData object?

sc.tl.gearys_c(pbmc, layer="logcounts")
to_plot = pbmc.var_names[np.argsort(pbmc.var.gearys_c)[:4]]

Sure, adding more and more features is a good point to think about the API, I’d just like to hear why all current analysis tools belong into tl and these two don’t!

ivirshup · 2019-11-13T12:15:58Z

@LuckyMD

There's definitely some similarities, especially for the overlap_count case. I'm planning on adding support for categories encoded as boolean sparse matrices, which would make the calculation even more similar. I don't think they have enough of a common use case to be merged however, especially since calculation is such a small part of both functions. sc.tl.marker_genes_overlap seems specific to differential expression, while confusion_matrix is specific to comparing sets of labels.

@flying-sheep

Why sc.metrics and not sc.tl? It doesn't feel like it fits. My expectation for most things in sc.tl and sc.pp is that they will take an anndata object and modify it. One of these functions takes an anndata object (optionally) and neither modifies it. I would agree with @LuckyMD that sc.tl.marker_gene_overlap could also fit in sc.metrics. There are possibly more philosophical points to be made about the difference between a "measure" and a "representation" but I'd have to think on that.

As to not modifying the AnnData object, It's not obvious to me how these functions would modify the object in a helpful way. I don't expect the outputs of these functions to have a similar shape to the AnnData object particularly often. Admittedly, the example I included has that, but I'm not sure if that'll be a common use case. You could whack it in .uns, but then there's either some default key or the user has to specify one. It'd be pretty straight forward for the user to do adata.uns["some key"] = sc.metrics.some_func.

I guess I'd summarize this as: I don't see added value from adding the results to the object – but I do see more complexity in implementing that.

flying-sheep

Great! Love your clean implementations.

scanpy/metrics/_gearys_c.py

flying-sheep · 2019-11-28T08:36:49Z

scanpy/metrics/_gearys_c.py

+# TODO: Document better
+# TODO: Have scanpydoc work with multipledispatch


Changes to scanpydoc are online fast, do you have something in line?

I'm pretty open to suggestions on this.

We could do something similar to R? There they give all the signatures and then have all the arguments for all of them mixed together.

Is it possible to have multiple Params sections in sphinx? Otherwise, maybe we find functions which have had multiple dispatch applied, then know to stitch their doc strings together somehow?

I feel like languages that use dispatch more heavily (and not like object.method) tend to have more narrative docs with examples of usage.

What’s the current output? Do we at least have the docstring and signature of one of them?

?sc.metrics.gearys_c looks like:

Dispatched docstring

Signature: sc.metrics.gearys_c(*args, **kwargs) Type: Dispatcher String form: <dispatched gearys_c> File: /usr/local/lib/python3.7/site-packages/multipledispatch/dispatcher.py Docstring: Multiply dispatched method: gearys_c Inputs: <AnnData> ------------------ Calculate `Geary's C` <https://en.wikipedia.org/wiki/Geary's_C>`_, as used by `VISION <https://doi.org/10.1038/s41467-019-12235-0>`_. Geary's C is a measure of autocorrelation for some measure on a graph. This can be to whether measures are correlated between neighboring cells. Lower values indicate greater correlation. ..math C = rac{ (N - 1)\sum_{i,j} w_{i,j} (x_i - x_j)^2 }{ 2W \sum_i (x_i - �ar{x})^2 } Params ------ adata vals Values to calculate Geary's C for. If this is two dimensional, should be of shape `(n_features, n_cells)`. Otherwise should be of shape `(n_cells,)`. This matrix can be selected from elements of the anndata object by using key word arguments: `layer`, `obsm`, `obsp`, or `use_raw`. use_graph Key to use for graph in anndata object. If not provided, default neighbors connectivities will be used instead. layer Key for `adata.layers` to choose `vals`. obsm Key for `adata.obsm` to choose `vals`. obsp Key for `adata.obsp` to choose `vals`. use_raw Whether to use `adata.raw.X` for `vals`. Returns ------- If vals is two dimensional, returns a 1 dimensional ndarray array. Returns a scalar if `vals` is 1d. Other signatures: spmatrix, spmatrix csr_matrix, csr_matrix spmatrix, ndarray spmatrix, DataFrame spmatrix, Series

So not that much of a signature. I added a doc-string to one of the other methods and got this:

Two doc-strings

Signature: sc.metrics.gearys_c(*args, **kwargs) Type: Dispatcher String form: <dispatched gearys_c> File: /usr/local/lib/python3.7/site-packages/multipledispatch/dispatcher.py Docstring: Multiply dispatched method: gearys_c Inputs: <spmatrix, ndarray> ---------------------------- Params ------ g Connectivity graph as a scipy sparse matrix. Should have shape: `(n_obs, n_obs)`. vals Values to calculate Geary's C for. If one dimensional, should have shape `(n_obs,)`. If two dimensional (i.e calculating Geary's C for multiple variables) should have shape `(n_vars, n_obs)`. Inputs: <AnnData> ------------------ Calculate `Geary's C` <https://en.wikipedia.org/wiki/Geary's_C>`_, as used by `VISION <https://doi.org/10.1038/s41467-019-12235-0>`_. Geary's C is a measure of autocorrelation for some measure on a graph. This can be to whether measures are correlated between neighboring cells. Lower values indicate greater correlation. ..math C = rac{ (N - 1)\sum_{i,j} w_{i,j} (x_i - x_j)^2 }{ 2W \sum_i (x_i - �ar{x})^2 } Params ------ adata vals Values to calculate Geary's C for. If this is two dimensional, should be of shape `(n_features, n_cells)`. Otherwise should be of shape `(n_cells,)`. This matrix can be selected from elements of the anndata object by using key word arguments: `layer`, `obsm`, `obsp`, or `use_raw`. use_graph Key to use for graph in anndata object. If not provided, default neighbors connectivities will be used instead. layer Key for `adata.layers` to choose `vals`. obsm Key for `adata.obsm` to choose `vals`. obsp Key for `adata.obsp` to choose `vals`. use_raw Whether to use `adata.raw.X` for `vals`. Returns ------- If vals is two dimensional, returns a 1 dimensional ndarray array. Returns a scalar if `vals` is 1d. Other signatures: spmatrix, spmatrix csr_matrix, csr_matrix spmatrix, DataFrame spmatrix, Series

Scanpydoc throws an error if I try to add this to the docs. The full log is below, but I think these are the relevant parts of the traceback:

File "/usr/local/lib/python3.7/site-packages/sphinx/events.py", line 103, in emit results.append(callback(self.app, *args)) File "/usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/return_tuple.py", line 46, in process_docstring ret_types = get_tuple_annot(get_type_hints(obj).get("return")) File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/typing.py", line 996, in get_type_hints 'or function.'.format(obj)) TypeError: <dispatched gearys_c> is not a module, class, method, or function.

Full error

# Sphinx version: 2.2.1 # Python version: 3.7.5 (CPython) # Docutils version: 0.15.2 release # Jinja2 version: 2.10.1 # Last messages: # loading pickled environment... # done # [autosummary] generating autosummary for: api/index.rst, api/plotting.rst, api/scanpy.Neighbors.compute_eigen.rst, api/scanpy.Neighbors.compute_neighbors.rst, api/scanpy.Neighbors.compute_transitions.rst, api/scanpy.Neighbors.connectivities.rst, api/scanpy.Neighbors.distances.rst, api/scanpy.Neighbors.distances_dpt.rst, api/scanpy.Neighbors.eigen_basis.rst, api/scanpy.Neighbors.eigen_values.rst, ..., external/scanpy.external.tl.cyclone.rst, external/scanpy.external.tl.palantir.rst, external/scanpy.external.tl.phate.rst, external/scanpy.external.tl.phenograph.rst, external/scanpy.external.tl.sandbag.rst, index.rst, installation.rst, references.rst, release_notes.rst, tutorials.rst # [autosummary] generating autosummary for: /Users/isaac/github/scanpy/docs/api/scanpy.metrics.confusion_matrix.rst, /Users/isaac/github/scanpy/docs/api/scanpy.metrics.gearys_c.rst # building [mo]: targets for 0 po files that are out of date # building [html]: targets for 6 source files that are out of date # updating environment: # [extensions changed ('5')] # 242 added, 0 changed, 0 removed # reading sources... [ 0%] api/index # Loaded extensions: # sphinx.ext.mathjax (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/mathjax.py # sphinxcontrib.applehelp (1.0.1) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/applehelp/__init__.py # sphinxcontrib.devhelp (1.0.1) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/devhelp/__init__.py # sphinxcontrib.htmlhelp (1.0.2) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/htmlhelp/__init__.py # sphinxcontrib.serializinghtml (1.1.3) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/serializinghtml/__init__.py # sphinxcontrib.qthelp (1.0.2) from /usr/local/lib/python3.7/site-packages/sphinxcontrib/qthelp/__init__.py # alabaster (0.7.12) from /usr/local/lib/python3.7/site-packages/alabaster/__init__.py # sphinx.ext.autodoc (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py # sphinx.ext.intersphinx (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/intersphinx.py # sphinx.ext.doctest (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/doctest.py # sphinx.ext.coverage (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/coverage.py # sphinx.ext.napoleon (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/napoleon/__init__.py # sphinx.ext.autosummary (2.2.1) from /usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py # sphinx_autodoc_typehints (unknown version) from /usr/local/lib/python3.7/site-packages/sphinx_autodoc_typehints.py # scanpydoc.autosummary_generate_imported (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/autosummary_generate_imported.py # scanpydoc.definition_list_typed_field (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/definition_list_typed_field.py # scanpydoc.elegant_typehints (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/__init__.py # scanpydoc.rtd_github_links (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/rtd_github_links.py # scanpydoc (0.4.4) from /usr/local/lib/python3.7/site-packages/scanpydoc/__init__.py Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sphinx/cmd/build.py", line 276, in build_main app.build(args.force_all, filenames) File "/usr/local/lib/python3.7/site-packages/sphinx/application.py", line 346, in build self.builder.build_update() File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 299, in build_update len(to_build)) File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 311, in build updated_docnames = set(self.read()) File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 418, in read self._read_serial(docnames) File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 439, in _read_serial self.read_doc(docname) File "/usr/local/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 479, in read_doc doctree = read_doc(self.app, self.env, self.env.doc2path(docname)) File "/usr/local/lib/python3.7/site-packages/sphinx/io.py", line 326, in read_doc pub.publish() File "/usr/local/lib/python3.7/site-packages/docutils/core.py", line 217, in publish self.settings) File "/usr/local/lib/python3.7/site-packages/sphinx/io.py", line 114, in read self.parse() File "/usr/local/lib/python3.7/site-packages/docutils/readers/__init__.py", line 77, in parse self.parser.parse(self.input, document) File "/usr/local/lib/python3.7/site-packages/sphinx/parsers.py", line 95, in parse self.statemachine.run(inputlines, document, inliner=self.inliner) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 171, in run input_source=document['source']) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2771, in underline self.section(title, source, style, lineno - 1, messages) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 327, in section self.new_subsection(title, lineno, messages) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection node=section_node, match_titles=True) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse node=node, match_titles=match_titles) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 196, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2771, in underline self.section(title, source, style, lineno - 1, messages) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 327, in section self.new_subsection(title, lineno, messages) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection node=section_node, match_titles=True) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse node=node, match_titles=match_titles) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 196, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/local/lib/python3.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2344, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2356, in explicit_construct return method(self, expmatch) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2099, in directive directive_class, match, type_name, option_presets) File "/usr/local/lib/python3.7/site-packages/docutils/parsers/rst/states.py", line 2148, in run_directive result = directive_instance.run() File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py", line 240, in run items = self.get_items(names) File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autosummary/__init__.py", line 349, in get_items documenter.add_content(None) File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py", line 471, in add_content for i, line in enumerate(self.process_doc(docstrings)): File "/usr/local/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py", line 441, in process_doc self.options, docstringlines) File "/usr/local/lib/python3.7/site-packages/sphinx/application.py", line 450, in emit return self.events.emit(event, *args) File "/usr/local/lib/python3.7/site-packages/sphinx/events.py", line 103, in emit results.append(callback(self.app, *args)) File "/usr/local/lib/python3.7/site-packages/scanpydoc/elegant_typehints/return_tuple.py", line 46, in process_docstring ret_types = get_tuple_annot(get_type_hints(obj).get("return")) File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/typing.py", line 996, in get_type_hints 'or function.'.format(obj)) TypeError: <dispatched gearys_c> is not a module, class, method, or function.

I can’t really se the signature because of the corrupted ANSI escape sequences. can you copy it from a terminal that lets you copy plain text?

the fix in scanpydoc seems easy!

I updated the header for the docstrings in the examples gave.

OK, that looks good! Can you test it with theislab/scanpydoc#12?

I've responded on that PR

scanpy/tests/test_metrics.py

scanpy/metrics/_metrics.py

scanpy/metrics/_gearys_c.py

flying-sheep

OK great! TODO:

get multipledispatch support into scanpydoc
add changelog entry

ivirshup · 2019-11-30T08:20:55Z

get multipledispatch support into scanpydoc

Would you mind handling that? I have zero experience with sphinx extensions.

ivirshup · 2019-11-30T08:46:52Z

Also two api thoughts:

For sc.metrics.gearys_c(a: "array", b: "array"), where b is 2d is expected to have a shape like: (variable, number_of_cells) – the ufunc shape signature would be: (m,m)(n,m)->(n,). This is because it needs fast access to each variable, so they correspond to rows. Also the length of the returned array depends on the first axis of the passed input. Is this intuitive, or should the input be transposed?

Second, for confusion_matrix, I'm thinking I should make it singly dispatched on the first argument. This way if a dataframe is passed, the next two arguments could correspond to keys in that dataframe. Otherwise, vectors can be passed directly. Under that, these calls would be equivalent:

sc.metrics.confusion_matrix(adata.obs, "sample_labels", "leiden")
sc.metrics.confusion_matrix(adata.obs["sample_labels"], adata.obs["leiden"])

Right now it has the seaborn style argument handling shown at the top of this PR. I'm not sure that's really caught on in other packages or fits with scanpy.

LuckyMD · 2019-12-01T12:16:49Z

Do you reckon it makes sense to make sc.tl.marker_genes_overlap() use this code internally to reduce code redundancy?

ivirshup · 2019-12-02T04:16:30Z

Nah, I think it'd end up being at least the same amount of code converting values into different types.

adamgayoso · 2021-01-20T00:58:05Z

any updates on this PR? @ivirshup I know some people in the lab really like your Geary's C implementation. Any thoughts on making it a small standalone package?

Also I do like the idea of the sc.metrics module if it makes more sense here.

ivirshup · 2021-01-20T03:23:11Z

@adamgayoso, I should definitely get around to merging this. I think I can pretty much do it as is, and open a second issue for getting the docs looking good. I'd like to target an initial metrics module for 1.8 (we're working on upping the release cadence as well).

Question for your lab, are our implementations equivalent? I haven't actually gotten around to testing against the VISION R/C++ version.

adamgayoso · 2021-01-20T17:23:26Z

I'd have to check with @deto or @mattjones315 about equivalency.

adamgayoso · 2021-01-20T17:33:58Z

also as an aside, would it be appropriate to include some of @LuckyMD scIB integration metrics here? It would give people easier access and probably expand general use.

LuckyMD · 2021-01-20T18:12:01Z

At the moment we're trying to clean up scIB that it becomes easier to use. We're still not certain how to best deal with metrics that rely on R and C++ code though. The current plan is to make a more usable pypi package where some metrics give you a warning on additional requirements/manual C++ compilation. Apologies for the usability mess that a package that also assesses usability has become ^^. I'd prefer to keep it separate for now to facilitate maintenance and citation though.

That being said, maybe we could think about an optional requirement for scIB to integrate them? At least when we've cleaned up our side of things.

ivirshup · 2021-01-21T00:23:40Z

@LuckyMD, I've actually been meaning to ask you about this. I'd like to include at least some of those metrics here. In some cases it mean making numba implementations (which would ultimately be easier to distribute). There would definitely be a prioritization for general usefulness of the metric.

What do you think about that?

LuckyMD · 2021-01-21T01:12:05Z

A numba reimplementation of some of the metrics sounds pretty awesome actually. That's out of scope for scIB at the moment. We didn't bother with parallelization for most of the metrics (beyond what was already implemented in sc.tl.louvain and the sklearn dependencies) as the slowest ones were in R anyway (and now also C++ with our LISI update). Would really welcome that. I can help where I can, although not so familiar with numba.

adamgayoso · 2021-01-21T02:45:28Z

just want to say that even a first release with some of the easiest to implement metrics could help lead to greater widespread use and IMO would generally be appreciated by the community. Besides the fact that it seems like a perfect fit for this scanpy module as I understand it.

Though I do understand the citation issue. Maybe it's time for a global citation table and each function can add to the table if there is an appropriate citation?! Maybe it could be accessed with sc.citation_table and displays which function calls used which paper's methods.

ivirshup · 2021-01-21T05:37:08Z

So, question from a user stand point:

Is it worth it for us to include the really really easy to implement metrics? The ones where we'd basically just be wrapping scikit-learn? I think this fits with the idea of scanpy's contents being curatorial to some extent.

Though I do understand the citation issue.

It's definitely good to have a citation in the docstring for each function.

For the docs of the metrics module, I think there would be a subsection for "Integration metrics" which could definitely point to scIB as a more comprehensive package for evaluating integration.

Maybe it's time for a global citation table and each function can add to the table if there is an appropriate citation?!

Are you suggesting that the table would be added to at runtime (when a function is called)? I think this may be better addressed by a broader solution to "what has been done to this dataset?". I'm not sure how this could be done without buy in from third party libraries. Also has been discussed a bit previously: #472.

adamgayoso · 2021-01-21T21:13:49Z

Are you suggesting that the table would be added to at runtime (when a function is called)? I think this may be better addressed by a broader solution to "what has been done to this dataset?". I'm not sure how this could be done without buy in from third party libraries. Also has been discussed a bit previously: #472.

Yeah I suppose, though I could see how this gets complicated by the fact that I imagine more people than myself use multiple h5ads throughout their analysis of the same dataset. Though maybe something simpler is to be able to access a global table of functions and citations, and it gives you the bibtex. But if there's any interest in it (a bit of a weird idea I understand) I can make an issue for it.

ivirshup · 2021-01-22T04:07:22Z

Though maybe something simpler is to be able to access a global table of functions and citations, and it gives you the bibtex.

@adamgayoso From my point of view, references are already available (source, rendered) and linked to in the documentation. I'm not against the idea, I'm just not seeing how it makes this information more accessible/ prominent. A separate issue for the topic would be good for more discussion.

Would really welcome that. I can help where I can, although not so familiar with numba.

@LuckyMD, meant to say, numba is super easy, it's really just python. Next best thing to Julia. Definitely worth some time to learn, but also it won't take that much time to learn.

Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it.

* Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability.

…s to build

ivirshup · 2021-03-02T10:12:09Z

I think this is currently bugged by: numba/numba#6774. It's a weird bug: some code just doesn't execute, unless I swap out a prange with a range, in which case it errors. Unless I add an expression that does nothing. Then it can work, except it's doing the expensive computation again 🤯.

It looks like this won't be solved by the next numba release, so working around it will be necessary for timely inclusion.

how'd this get in here

This reverts commit d2a8328.

* Add `sc.metrics` with `gearys_c` Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it. * Add sc.metrics.confusion_matrix * Better tests and output for confusion_matrix * Workaround umap<0.4 and increase numerical stability of gearys_c * Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability. * Split up gearys_c tests * Improved unexpected error message * gearys_c working again. Sadly, a bit slower * One option for doc strings * Simplify implementation to use single dispatch * release notes

* add flake8 pre-commit Signed-off-by: Zethson <[email protected]> * fix pre-commit Signed-off-by: Zethson <[email protected]> * add E402 to flake8 ignore Signed-off-by: Zethson <[email protected]> * revert neighbors Signed-off-by: Zethson <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * address review Signed-off-by: Zethson <[email protected]> * fix comment character in .flake8 Signed-off-by: Zethson <[email protected]> * fix test Signed-off-by: Zethson <[email protected]> * black Signed-off-by: Zethson <[email protected]> * review round 2 Signed-off-by: Zethson <[email protected]> * review round 3 Signed-off-by: Zethson <[email protected]> * readded double comments Signed-off-by: Zethson <[email protected]> * Ignoring E262 & reverted comment Signed-off-by: Zethson <[email protected]> * using self for obs_tidy Signed-off-by: Zethson <[email protected]> * Restore setup.py * rm call of black test (#1690) * Fix print_versions for python<3.8 (#1691) * add codecov so we can have a badge to point to (#1693) * Attempt server-side search (#1672) * Fix paga_path (#1047) Fix paga_path Co-authored-by: Isaac Virshup <[email protected]> * Switch to flit This reverts commit d645790 * add setup.py while leaving it ignored * Update install instructions * Circumvent new pip check (see pypa/pip#9628) * Go back to regular pip (#1702) * Go back to regular flit Co-authored-by: Isaac Virshup <[email protected]> * codecov comment (#1704) * Use joblib for parallelism in regress_out (#1695) * Use joblib for parallism in regress_out * release note * fix link in release notes * Add todo for resource test * Add sparsificiation step before sparse-dependent Scrublet calls (#1707) * Add sparsificiation step before sparse-dependent Scrublet calls * Apply sparsification suggestion Co-authored-by: Isaac Virshup <[email protected]> * Fix imports Co-authored-by: Isaac Virshup <[email protected]> * Fix version on Travis (#1713) By default, Travis does `git clone --depth=50` which means the version can’t be detected from the git tag. * `sc.metrics` module (add confusion matrix & Geary's C methods) (#915) * Add `sc.metrics` with `gearys_c` Add a module for computing useful metrics. Started off with Geary's C since I'm using it and finding it useful. I've also got a fairly fast way to calculate it worked out. Unfortunatly my implementation runs into some issues with some global configs set by umap (see lmcinnes/umap#306), so I'm going to see if that can be resolved before changing it. * Add sc.metrics.confusion_matrix * Better tests and output for confusion_matrix * Workaround umap<0.4 and increase numerical stability of gearys_c * Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability. * Split up gearys_c tests * Improved unexpected error message * gearys_c working again. Sadly, a bit slower * One option for doc strings * Simplify implementation to use single dispatch * release notes * Fix clipped images in docs (#1717) * Cleanup normalize_total (#1667) * Cleanup normalize_total * Add modification tests and copy kwarg for normalize_total * Test that 'layers' argument is deprecated * Added more mutation checks for normalize_total * release note * Error message * deprecate scvi (#1703) * deprecate scvi * Update .azure-pipelines.yml Co-authored-by: Isaac Virshup <[email protected]> * remove :func: links to scvi in release notes * remove tildes in front of scvi in release notes * Update docs/release-notes/1.5.0.rst Co-authored-by: Michael Jayasuriya <[email protected]> Co-authored-by: Isaac Virshup <[email protected]> * updated ecosystem.rst to add triku (#1722) * Minor addition to contributing docs (#1726) * Preserve category order when groupby is a list (#1735) Preserve category order when groupby is a list * Asymmetrical diverging colormaps and vcenter (#1551) Add vcenter and norm arguments to plotting functions * add flake8 pre-commit Signed-off-by: Zethson <[email protected]> * add E402 to flake8 ignore Signed-off-by: Zethson <[email protected]> * revert neighbors Signed-off-by: Zethson <[email protected]> * address review Signed-off-by: Zethson <[email protected]> * black Signed-off-by: Zethson <[email protected]> * using self for obs_tidy Signed-off-by: Zethson <[email protected]> * rebased Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * rebasing Signed-off-by: Zethson <[email protected]> * add flake8 to dev docs Signed-off-by: Zethson <[email protected]> * add autopep8 to pre-commits Signed-off-by: Zethson <[email protected]> * add flake8 ignore docs Signed-off-by: Zethson <[email protected]> * add exception todos Signed-off-by: Zethson <[email protected]> * add ignore directories Signed-off-by: Zethson <[email protected]> * reinstated lambdas Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * fix tests Signed-off-by: Zethson <[email protected]> * Add E741 to allowed flake8 violations. Co-authored-by: Isaac Virshup <[email protected]> * Add F811 flake8 ignore for tests Co-authored-by: Isaac Virshup <[email protected]> * Fix mask comparison Co-authored-by: Isaac Virshup <[email protected]> * Fix mask comparison Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 config file Signed-off-by: Zethson <[email protected]> * readded autopep8 Signed-off-by: Zethson <[email protected]> * import Literal Signed-off-by: Zethson <[email protected]> * revert literal import Signed-off-by: Zethson <[email protected]> * fix scatterplot pca import Signed-off-by: Zethson <[email protected]> * false comparison & unused vars Signed-off-by: Zethson <[email protected]> * Add cleaner level determination Co-authored-by: Isaac Virshup <[email protected]> * Fix comment formatting Co-authored-by: Isaac Virshup <[email protected]> * Add smoother dev documentation Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * Readd long comment Co-authored-by: Isaac Virshup <[email protected]> * Assuming X as array like Co-authored-by: Isaac Virshup <[email protected]> * fix flake8 Signed-off-by: Zethson <[email protected]> * fix flake8 config Signed-off-by: Zethson <[email protected]> * reverted rank_genes Signed-off-by: Zethson <[email protected]> * fix disp_mean_bin formatting Co-authored-by: Isaac Virshup <[email protected]> * fix formatting Signed-off-by: Zethson <[email protected]> * add final todos Signed-off-by: Zethson <[email protected]> * boolean checks with is Signed-off-by: Zethson <[email protected]> * _dpt formatting Signed-off-by: Zethson <[email protected]> * literal fixes Signed-off-by: Zethson <[email protected]> * links to leafs Signed-off-by: Zethson <[email protected]> * revert paga variable naming Co-authored-by: Philipp A <[email protected]> Co-authored-by: Sergei Rybakov <[email protected]> Co-authored-by: Isaac Virshup <[email protected]> Co-authored-by: Jonathan Manning <[email protected]> Co-authored-by: mjayasur <[email protected]> Co-authored-by: Michael Jayasuriya <[email protected]> Co-authored-by: Alex M. Ascensión <[email protected]> Co-authored-by: Gökçen Eraslan <[email protected]>

ivirshup requested a review from flying-sheep November 13, 2019 06:33

ivirshup force-pushed the metrics branch from e7f68f3 to 0b3c8a6 Compare November 13, 2019 06:38

flying-sheep requested changes Nov 28, 2019

View reviewed changes

flying-sheep requested changes Nov 29, 2019

View reviewed changes

flying-sheep mentioned this pull request Dec 4, 2019

Avoid error in multipledispatch theislab/scanpydoc#12

Closed

falexwolf force-pushed the master branch from aa3acd7 to fd4bc99 Compare December 30, 2019 00:53

ivirshup mentioned this pull request Mar 10, 2020

Fixes for visium reading #1088

Merged

ivirshup added this to the 1.8.0 milestone Jan 20, 2021

ivirshup added 4 commits March 2, 2021 18:16

Add sc.metrics.confusion_matrix

7bc6a55

Better tests and output for confusion_matrix

05a88cb

Workaround umap<0.4 and increase numerical stability of gearys_c

9a783b6

* Work around lmcinnes/umap#306 by not calling out to kernel function. That code has been kept, but commented out. * Increase numerical stability by casting data to system width. Tests were failing due to instability.

ivirshup added 9 commits March 2, 2021 18:16

Fix for anndata<0.7

ca7814c

Add another check for anndata>0.7

d64d8c6

Fixup formatting in metrics tests

e3e136a

Split up gearys_c tests

b75e79e

Additional gearys_c method doc, and doc formatting

328813f

First attempt at adding metrics module to online docs, currently fail…

54e655f

…s to build

Improved unexpected error message

ea9eed4

Formatting of short conditionals

6f8144a

Minor cleanups

c438606

ivirshup force-pushed the metrics branch from aa3eb89 to c438606 Compare March 2, 2021 09:32

ivirshup added 5 commits March 3, 2021 14:21

gearys_c working again. Sadly, a bit slower

c30d816

One option for doc strings

2ac9077

Simplify implementation to use single dispatch

d27b9f3

Fixup confusion matrix rendered docs

048d470

release notes

8967c5b

ivirshup changed the title ~~[RFC] sc.metrics module (add confusion matrix & Geary's C methods)~~ sc.metrics module (add confusion matrix & Geary's C methods) Mar 3, 2021

ivirshup added 6 commits March 3, 2021 20:33

Delete requirements.txt

3a6f5e6

how'd this get in here

Make first line one line in rst

d2a8328

confusion matrix API clarification and test

f6641c2

Remove broken test

6f53205

Revert "Make first line one line in rst"

a8bdcef

This reverts commit d2a8328.

Bump doc build

b9591d5

ivirshup merged commit b3a2a6d into scverse:master Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sc.metrics` module (add confusion matrix & Geary's C methods) #915

`sc.metrics` module (add confusion matrix & Geary's C methods) #915

ivirshup commented Nov 13, 2019

LuckyMD commented Nov 13, 2019

flying-sheep commented Nov 13, 2019

ivirshup commented Nov 13, 2019

flying-sheep left a comment •

edited

Loading

flying-sheep Nov 28, 2019

ivirshup Nov 28, 2019

flying-sheep Nov 28, 2019

ivirshup Nov 29, 2019 •

edited

Loading

ivirshup Nov 29, 2019

flying-sheep Nov 29, 2019

ivirshup Nov 30, 2019

flying-sheep Dec 4, 2019

ivirshup Dec 5, 2019

flying-sheep left a comment

ivirshup commented Nov 30, 2019

ivirshup commented Nov 30, 2019

LuckyMD commented Dec 1, 2019

ivirshup commented Dec 2, 2019

adamgayoso commented Jan 20, 2021 •

edited

Loading

ivirshup commented Jan 20, 2021

adamgayoso commented Jan 20, 2021

adamgayoso commented Jan 20, 2021

LuckyMD commented Jan 20, 2021 •

edited

Loading

ivirshup commented Jan 21, 2021 •

edited

Loading

LuckyMD commented Jan 21, 2021 •

edited

Loading

adamgayoso commented Jan 21, 2021 •

edited

Loading

ivirshup commented Jan 21, 2021 •

edited

Loading

adamgayoso commented Jan 21, 2021

ivirshup commented Jan 22, 2021

ivirshup commented Mar 2, 2021

		# TODO: Document better
		# TODO: Have scanpydoc work with multipledispatch

sc.metrics module (add confusion matrix & Geary's C methods) #915

sc.metrics module (add confusion matrix & Geary's C methods) #915

Conversation

ivirshup commented Nov 13, 2019

sc.metrics.confusion_matrix

sc.metrics.gearys_c (Wiki page)

LuckyMD commented Nov 13, 2019

flying-sheep commented Nov 13, 2019

ivirshup commented Nov 13, 2019

flying-sheep left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivirshup Nov 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flying-sheep left a comment

Choose a reason for hiding this comment

ivirshup commented Nov 30, 2019

ivirshup commented Nov 30, 2019

LuckyMD commented Dec 1, 2019

ivirshup commented Dec 2, 2019

adamgayoso commented Jan 20, 2021 • edited Loading

ivirshup commented Jan 20, 2021

adamgayoso commented Jan 20, 2021

adamgayoso commented Jan 20, 2021

LuckyMD commented Jan 20, 2021 • edited Loading

ivirshup commented Jan 21, 2021 • edited Loading

LuckyMD commented Jan 21, 2021 • edited Loading

adamgayoso commented Jan 21, 2021 • edited Loading

ivirshup commented Jan 21, 2021 • edited Loading

adamgayoso commented Jan 21, 2021

ivirshup commented Jan 22, 2021

ivirshup commented Mar 2, 2021

`sc.metrics` module (add confusion matrix & Geary's C methods) #915

`sc.metrics` module (add confusion matrix & Geary's C methods) #915

`sc.metrics.confusion_matrix`

`sc.metrics.gearys_c` (Wiki page)

flying-sheep left a comment •

edited

Loading

ivirshup Nov 29, 2019 •

edited

Loading

adamgayoso commented Jan 20, 2021 •

edited

Loading

LuckyMD commented Jan 20, 2021 •

edited

Loading

ivirshup commented Jan 21, 2021 •

edited

Loading

LuckyMD commented Jan 21, 2021 •

edited

Loading

adamgayoso commented Jan 21, 2021 •

edited

Loading

ivirshup commented Jan 21, 2021 •

edited

Loading