Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken ZeroDivisionError #1276

Closed
choudharis2 opened this issue Aug 19, 2020 · 8 comments · Fixed by #1347
Closed

Kraken ZeroDivisionError #1276

choudharis2 opened this issue Aug 19, 2020 · 8 comments · Fixed by #1347
Labels
bug: module Bug in a MultiQC module

Comments

@choudharis2
Copy link

choudharis2 commented Aug 19, 2020

Description of bug:
Some of our samples reporting following Kraken module error in multiqc 1.9. we are using the kraken2/2.0.7.

MultiQC Error log:

Module kraken raised an exception: Traceback (most recent call last):
  File "/python3.7/site-packages/multiqc-1.9-py3.7.egg/multiqc/multiqc.py", line 569, in run
    output = mod()
  File "//multiqc/1.9/lib/python3.7/site-packages/multiqc-1.9-py3.7.egg/multiqc/modules/kraken/kraken.py", line 60, in __init__
    self.sum_sample_counts()
  File "//multiqc/1.9/lib/python3.7/site-packages/multiqc-1.9-py3.7.egg/multiqc/modules/kraken/kraken.py", line 134, in sum_sample_counts
    self.kraken_sample_total_readcounts[s_name] = round(float(row['counts_rooted']) / (row['percent'] / 100.0))
ZeroDivisionError: float division by zero

File that triggers the error:

MultiQC run details (please complete the following):

  • Command used to run MultiQC:
  • MultiQC Version:
  • Operating System:
  • Python Version:
  • Method of MultiQC installation:

Additional context

@acbellorib
Copy link

Hi there,
thanks for providing this wonderful tool! Just to report that I've gotten exactly the same error message when trying to run MultiQC 1.9 (installed on a fresh Conda environment with Python 3.7, just like suggested in the MultiQC manual, over a Linux CentOS 7 3.10.0-957.27.2.el7.x86_64 kernel) on one of our Kraken (1.1.1) reports. I will continue to investigate the issue on my side but any hint or help would be highly appreciated! Cheers, Antonio.

@apeltzer
Copy link
Contributor

apeltzer commented Oct 1, 2020

Hi both!

I've just encountered the same issue and found the reason for this behaviour. If you run Kraken2 on data and supply the parameter --report-zero-counts, then some categories will have a denominator of zero, which leads to such a divison by zero and thus to a numeric exception raised by python and a failure of the kraken2 module.

https://github.com/nf-core/viralrecon/blob/master/main.nf#L2076--L2085

(as an example code). When I remove that --report-zero-counts, the module runs fine for all the datasets that crashed previously.

Solution/Bugfix would be that we add a pseudo-count to zero-counts (?) and thus the division cannot go wrong here: https://github.com/ewels/MultiQC/blob/7584e64e10885f38367628a5a4a9033d48e82011/multiqc/modules/kraken/kraken.py#L134

@ewels what do you think? I can probably take this on then for an upcoming MultiQC release....

@apeltzer
Copy link
Contributor

apeltzer commented Oct 1, 2020

Sorry if that wasn't clear: For me it was the --report-zero-counts but I'm not sure whether this holds true for everyone else! Although I believe that this setting controls whether to display zero counts in the output at all in general - so should potentially help resolve this for others until we have a working bugfix in the kraken2 module that fixes the issue in MultiQC. Until then, simply omitting that parameter in kraken/kraken2 should make the current MultiQC module work for you @choudharis2 , @acbellorib and @rpetit3

@acbellorib
Copy link

Thanks, Alexander, for the verification and tip! I'll make some tests asap! Cheers, Antonio.

@davidealbanese
Copy link
Contributor

davidealbanese commented Nov 18, 2020

I encountered the same error using kraken2 2.0.9-beta without the option --report-zero-counts and MultiQC 1.9.
I think that using the kraken2 percentages (row['percent']) in:

https://github.com/ewels/MultiQC/blob/7584e64e10885f38367628a5a4a9033d48e82011/multiqc/modules/kraken/kraken.py#L134

is quite dangerous since they include only two decimal digits and can potentially be 0.00 in all samples considered.

Files that trigger the error:
kraken_reports.tar.gz

@ewels ewels added the bug: module Bug in a MultiQC module label Dec 28, 2020
@ewels
Copy link
Member

ewels commented Dec 28, 2020

Thanks all! Sounds sensible @apeltzer though I would prefer to simply catch the exception - that's what I usually do in these circumstances. eg:

try:
    self.kraken_sample_total_readcounts[s_name] = round(float(row['counts_rooted']) / (row['percent'] / 100.0))
except ZeroDivisionError:
    self.kraken_sample_total_readcounts[s_name] = 0

@davidealbanese - is there an alternative to using the percentages? I would assume from looking at the code snippet that this is a workaround to get the total counts which would be impossible by other means. If we have actual total counts then I agree that it would be better. I have a vague, vague memory that the report might truncate off some categories which meant that they couldn't be summed to a library total or something? But I could be confusing this with another MultiQC module.

Phil

@ewels ewels changed the title The 'kraken' MultiQC module broke Kraken ZeroDivisionError Dec 28, 2020
@davidealbanese
Copy link
Contributor

@ewels, the PR #1347 fix the issue by using the counts only. The total count is simply the sum of the counts of each taxa.
I'm pretty sure that kraken2 only reports the classified reads, and the reported percentages refer to that number.

@ewels
Copy link
Member

ewels commented Mar 4, 2021

Should now be fixed 👍🏻 Thanks @davidealbanese !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants