Pielou's Evenness can produce nan, causing downstream exceptions #226

wasade · 2018-12-06T16:32:05Z

Bug Description
Pielou's Evenness returns nan when a sample vector composed of a single feature is presented. At present, this gets represented as the empty string in the resulting output, and is re-represented as a nan in QIIME.

Downstream consumers of these data (e.g., alpha-group-significance) raise a divide by zero error when presented with these data (not show in the example below).

Steps to reproduce the behavior

In [13]: import qiime2

In [14]: import pandas as pd

In [15]: import skbio

In [16]: adiv = pd.Series([skbio.diversity.alpha._base.pielou_e(v) for v in ([0,1,2,3], [0,1,0,0], [1,2,3,0])],
    ...:                  index=['foo', 'bar', 'baz'])
    ...:

In [17]: adiv
Out[17]:
foo    0.92062
bar        NaN
baz    0.92062
dtype: float64

In [18]: ar = qiime2.Artifact.import_data('SampleData[AlphaDiversity]', adiv)

In [19]: reloaded = ar.view(pd.Series)

In [20]: reloaded
Out[20]:
foo    0.92062
bar        NaN
baz    0.92062
Name: 0, dtype: float64

Expected behavior
It's not clear yet whether this is the correct behavior.

Comments
After internal discussion, we thought it made sense to first open an issue here with q2-diversity as it is an open question whether we should allow nan in the output, whether consumers should be expected to gracefully handle these data, or whether this represents a bug in scikit-bio's Pielou's Evenness implementation.

This is an unusual pathological case, as it is unusual for a sample to contain only a single feature. However, this scenario arose on two independent executions of the C. diff FMT QIIME2 tutorial so it does appear to possible to occur with real data.

The text was updated successfully, but these errors were encountered:

antgonza · 2020-09-10T16:53:26Z

BTW, a Qiita user hit this issue yesterday while using core diversity analysis.

ChrisKeefe · 2020-09-24T19:37:02Z

q2-diversity-lib offers a _drop_undefined_samples utility, and this behavior is already wired up in the pielou_evenness implementation there. Undefined samples are not dropped by default, preferring to let users make that choice actively.

IIRC, our original decision not to expose this parameter in alpha and core-metrics was just about API simplicity. I suspect wiring it up would be pretty easy if we decide there's value in doing so.

antgonza · 2020-09-24T19:43:25Z

Good to know. IMOO it will be better to have that flag default to True (drop) as I'm not aware of any direct utility of keeping the Nones in the results but I might be missing something ... thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pielou's Evenness can produce nan, causing downstream exceptions #226

Pielou's Evenness can produce nan, causing downstream exceptions #226

wasade commented Dec 6, 2018

antgonza commented Sep 10, 2020

ChrisKeefe commented Sep 24, 2020

antgonza commented Sep 24, 2020