-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heatmap for Per Sequence Quality Scores is an extra base long #812
Comments
Hi @MattBashton, Thanks for the detailed report. I guess you are specifically talking about the hover text when moving the cursor over the heatmap? The colouring of the heatmap looks fine to me when I make a report with your files, but I see that the text does report 0-100 with 100% on base 100 as you say. I've just had a look at the code and this hover text is generated dynamically by taking the cursor position and reverse-engineering the colour code underneath it. The 100% G is because the browser is incorrectly reporting Note that it can be a little tricky to get precisely the correct base numbers because of the way that FastQC reports this data. It does so with a base position range, which can vary:
The data behind the MultiQC plot needs to have single data points (for the line graph), so I get around this by just picking the average value of the range and using that. In short - it's best to be a little careful with either a MultiQC or FastQC report if you really need per-base accuracy, especially at the end of reads. Interestingly the FastQC HTML reports themselves say that your reads are 101bp long! Maybe a bug in FastQC..? Phil |
Hi Phil, yes just to confirm, will the fix prevent the 100% G base being reported on mouse over? I feel this is important because checking for poor calls at the end of reads is one of them main usage cases of both FastQC and MulltiQC prior to trimming, so eliminating this bug is a good thing. Given this behaviour (at present) one would incorrectly assume their reads were all 100% G at position 100, (which is of course a bogus position 101) if we incorrectly assuming 1 based indexing (which is reasonable), but incorrectly interpreting this plot readout could cause the end user to consider trimming base 100 - which would be unnecessary. You sate that you can't prevent the 100% G being reported? This is a critical issue IMO. Another problem this plot has too is that the averages (par the final 100 position) appear to be reported in bins of two, so positions 0-1, have the same Q score, as do 1-2, and 2-3, i.e for 0,1 is the same value as is 1-2 and 2-3. However here is a another issue, every base is reported twice, so moving left to right the mouse over shows stats at the top of the plot for 0,1,1,2,2 but each successive report of frequencies for the bases differs, so bp 2 is reported as: 34, 21, 22, 23% initially but move 1 pixel to the left and the readout for bp 2 is now: 26, 24, 25, 25 for read 1. This is somewhat broken because how is the end user supposed to interpret the correct value for the 2nd base pair when two are reported. I fully understand that underlying this data FastQC is reporting them in 2bp bins, which is why this happens but the heatmap needs to be transparent to this, otherwise mis interpretation can occur. Perhaps you should render the heatmap at 2pb resolution, rather than reporting every data point twice with different values? Yes I had noticed that current FastQC reports these reads to be 101bp in length, although it's plots don't runt to 101bp, suspect this is also a bug in FastQC, although as downstream input it's one you might want to be aware off. |
I've attached my report output to show what I mean. |
I should update this the reads are indeed 101bp long, forgot emacs uses 0 based indexing whilst checking. However the 101 based being reported as 100% G is clearly not correct. |
Show the original base pair range label when hovering on the sequence content line plot data. see #812.
Use the original data instead of calculating it from the visible pixels. See issue #812
I was a little concerned that implementing the changes you suggest would involve rewriting a tonne of the code. However, after having a poke around and refreshing myself with how the plot is created, I realised that most of the pieces were already in place. I've refactored the code for both the heatmap and the line plot to show the original FastQC base pair range. I've also updated the heatmap to show the raw base percentages instead of back-converting from the colours (though with no decimal places showing it's unlikely that this will make a detectable difference). The side effect of pulling the original data in this way is that it's fixed the 100% G bug 🎉 I quite enjoyed fixing this one! Please give it a try and confirm that it works as expected now. |
Added 3px x offset for picking up the base position when hovering on the sequence content heatmap. Done empirically...! See #812
Hey glad to hear you enjoyed fixing it :) I've re ran my end, and the output looks flawless with respect to this issue, so glad this got fixed!!! Which was my motivation for reporting it! Attached is the output. |
Great! Thanks for reporting and testing 👍 |
Description of bug:
Per Sequence Quality Scores heatmap is incorrect on the final base (which is actually non existent), my reads are 100bp long, the individual sample report will show positions numbered 1 to 100 with the final base (100) having sane levels of A, T, C and G i.e. 24-26% in my various input files. Using the heatmap however will produce an image with bases numbered positions 0-100 (i.e. 101 bp in length) here the final base is always 100% G however this data point is bogus. I have confirmed the actual length of my input reads to be 100bp using a text editor (emacs with column numbering mode enabled). Additionally these two plots should ideally either be both 1 indexed or 0 indexed but not a mixture of both
MultiQC Error log:
None generated
Files that triggers the error:
Case_001_R2_fastqc.zip
Case_001_R1_fastqc.zip
MultiQC run details (please complete the following):
multiqc <dir>
pip install --user
Additional context
FastQC v0.11.7
The text was updated successfully, but these errors were encountered: