Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pielou's Evenness can produce nan, causing downstream exceptions #226

Open
wasade opened this issue Dec 6, 2018 · 3 comments
Open

Pielou's Evenness can produce nan, causing downstream exceptions #226

wasade opened this issue Dec 6, 2018 · 3 comments

Comments

@wasade
Copy link
Member

wasade commented Dec 6, 2018

Bug Description
Pielou's Evenness returns nan when a sample vector composed of a single feature is presented. At present, this gets represented as the empty string in the resulting output, and is re-represented as a nan in QIIME.

Downstream consumers of these data (e.g., alpha-group-significance) raise a divide by zero error when presented with these data (not show in the example below).

Steps to reproduce the behavior

In [13]: import qiime2

In [14]: import pandas as pd

In [15]: import skbio

In [16]: adiv = pd.Series([skbio.diversity.alpha._base.pielou_e(v) for v in ([0,1,2,3], [0,1,0,0], [1,2,3,0])],
    ...:                  index=['foo', 'bar', 'baz'])
    ...:

In [17]: adiv
Out[17]:
foo    0.92062
bar        NaN
baz    0.92062
dtype: float64

In [18]: ar = qiime2.Artifact.import_data('SampleData[AlphaDiversity]', adiv)

In [19]: reloaded = ar.view(pd.Series)

In [20]: reloaded
Out[20]:
foo    0.92062
bar        NaN
baz    0.92062
Name: 0, dtype: float64

Expected behavior
It's not clear yet whether this is the correct behavior.

Comments
After internal discussion, we thought it made sense to first open an issue here with q2-diversity as it is an open question whether we should allow nan in the output, whether consumers should be expected to gracefully handle these data, or whether this represents a bug in scikit-bio's Pielou's Evenness implementation.

This is an unusual pathological case, as it is unusual for a sample to contain only a single feature. However, this scenario arose on two independent executions of the C. diff FMT QIIME2 tutorial so it does appear to possible to occur with real data.

@antgonza
Copy link
Member

BTW, a Qiita user hit this issue yesterday while using core diversity analysis.

@ChrisKeefe
Copy link
Collaborator

q2-diversity-lib offers a _drop_undefined_samples utility, and this behavior is already wired up in the pielou_evenness implementation there. Undefined samples are not dropped by default, preferring to let users make that choice actively.

IIRC, our original decision not to expose this parameter in alpha and core-metrics was just about API simplicity. I suspect wiring it up would be pretty easy if we decide there's value in doing so.

@antgonza
Copy link
Member

Good to know. IMOO it will be better to have that flag default to True (drop) as I'm not aware of any direct utility of keeping the Nones in the results but I might be missing something ... thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants