Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] fgbio error rate by read position per-base plots #1251

Merged
merged 3 commits into from
Jul 23, 2020

Conversation

nh13
Copy link
Contributor

@nh13 nh13 commented Jul 15, 2020

If the error rate for any base-to-base (ex. A>C/a_to_c_error_rate column ) error
was higher than the overall error rate (error_rate column), then it
would be filtered out by y_max.

This fix calculates y_max across not only the overall error rate (across
positions), but also the base-to-base error rate columns.

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated

nh13 added 2 commits July 15, 2020 16:27
If the error rate for any base-to-base (ex. A>C/a_to_c_error_rate column ) error
was higher than the overall error rate (error_rate column), then it
would be filtered out by y_max.

This fix calculates y_max across not only the overall error rate (across
positions), but also the base-to-base error rate columns.
@nh13
Copy link
Contributor Author

nh13 commented Jul 15, 2020

@ewels I was suprised that setting ymax causes data to be discarded versus adjusting the field of view of the data. For example, I would have expected setting ymax to 0.5 would mean not that points like 0.7 would be discarded, just that they'd be off the screen. For a line graph, I would have expected the line to go off the screen, then come back down. I would have also been ok with all values above ymax be set to ymax. But I think the worst thing that can happen is for them to be discarded. Because then in the linegraph, we get zero values for missing data, which falsely tells us the error rate for that position/cycle is zero, when it really was >ymax. Does that make sense why discarding is dangerous?

@nh13
Copy link
Contributor Author

nh13 commented Jul 15, 2020

@nh13
Copy link
Contributor Author

nh13 commented Jul 22, 2020

@ewels any thoughts on getting this merged and a new release?

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@ewels ewels merged commit 8fcd655 into MultiQC:master Jul 23, 2020
@ewels
Copy link
Member

ewels commented Jul 23, 2020

Apologies @nh13 - I've been swamped with hackathons and conferences lately and am focussing on @nf-core stuff (which has waited longer than MultiQC for a release).

I'll get back to MultiQC + a new release ASAP, but won't be for a few weeks as tomorrow is my last day before some holiday. In the mean time I've merged this so it's at least fixed on dev 👍🏻

@ewels
Copy link
Member

ewels commented Jul 23, 2020

ps. Yes I agree that it seems odd and a little dangerous to discard data when above ymax. Digging back through the git blame I apparently added this about 3 years ago (9599630) for the following reason:

  • Line graphs and scatter graphs axis limits
    • If limits are specified, data exceeding this is no longer saved in report
    • Visually identical, but can make report file sizes considerable smaller in some cases

If you look in the same commit, I was also modifying the Preseq module - from memory that was spitting out these huge exponential curves with thousands and thousands of points. So in that context it kind of made sense, but I agree that if it's just ditching a single point in an otherwise bumpy line then this looks like missing data and is misleading.

I guess the correct solution here is to only discard those points if the curve doesn't come back under ymax again. Then that can correctly truncate the crazy preseq curves and save on report size, but won't drop data mid-way through a plot. Does that sound ok? I'll make an issue..

Phil

@ewels
Copy link
Member

ewels commented Jul 23, 2020

x-ref #1257

@nh13
Copy link
Contributor Author

nh13 commented Jul 23, 2020

Thanks! Enjoy your vacation!

@nh13 nh13 deleted the bugfix/fgbio-error-rate-by-read-position branch August 20, 2020 21:14
@nh13
Copy link
Contributor Author

nh13 commented Aug 20, 2020

@ewels any chance we could have a release with this included?

@ewels
Copy link
Member

ewels commented Mar 29, 2021

@ng13 - the data discarding problem you saw should now be fixed in 74b781e (#1257)

Thanks for spotting it and pointing it out 👍🏻 Let me know if you find any issues with the new code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants