Releases: rhpvorderman/sequali
Releases · rhpvorderman/sequali
version 0.12.0
- Properly name percentiles as such in the sequence length distribution rather
than using N50 nomenclature which is not correct. - Fix a bug where BAM files with missing quality sequences were inproperly
handled. - Update internal UniVec database to version from November 21st 2023.
version 0.11.1
- Fix a memory leak that occurred in Python 3.12 due to a refcounting API
change.
version 0.11.0
- Make figure IDs reproducible across HTML reports.
- Fix a bug where the average phred score per read would be rounded, not
floored. This would lead reads with a phred score such as 9.7 to be counted
towards the Q>=10 results. - Replace some of the hand vectorized code with more generic code that can be
automatically be optimized by the compiler. This should make things faster on
Windows and ARM64 platforms. This also means results should be consistent
across platforms and no longer depend on the presence of vector instructions.
version 0.10.0
- Make overrepresented sequences table scrollable and smaller so it is easier
to skip over when lots of entries are found. - Overrepresented sequence analysis now only counts unique fragments per read.
Fragments that are duplicated inside the read are counted only once. This
prevents long stretches of genomic repeats getting higher reported
frequencies. - Speed up sequence identity calculations on AVX2 enabled CPUs by using a
reverse-diagonal parallelized version of Smith-Waterman.
version 0.9.1
- Fix an issue where the insert size metrics module would crash when no
adapters where present.
version 0.9.0
- MultiQC support since MultiQC version 1.22
- Sort modules for paired end reports in the same order as single end reports.
For example, the sequence length distributions for read 1 and read 2 are now
right after each other. - Add common human genome repeats and Illumina poly-G dark cycles to the
overrepresented sequences database. - Illumina adapter trimming sequences were added to the contaminants database
as these were missing from the UniVec database. - Sequence identity, rather than kmers matched is shown as a metric for
similarity in the overrepresented sequences table. - Overrepresented sequence classification now uses stable sorting to ensure
the classification results are the same on each rerun. - Overrepresented sequences are now classified using Smith-Waterman alignment
and sequence identity. - Fix an off by one error in the insert size metrics that was triggered for
insert sizes larger than 300 bp.
version 0.8.0
- A citation file was added to the repository.
- Calculate insert sizes and used adapters based on overlap between the
read pairs. - Both reads from paired-end reads are taken into consideration when
evaluating the duplication rate. - Support for paired-end reads added.
- Minor performance improvement by providing a non-temporal cache hint in the
QCMetrics module.
version 0.7.1
- Fix a small visual bug in the report sidebar.
- PyGAL report htmls are now fully HTML5 compliant. HTML5 validation has been
made a part of the integration testing.
version 0.7.0
- Image files can now be saved as SVG files.
- The javascript file for the tooltip highlighting is now embedded in the
html file so no internet access is needed for the functionality. - A sidebar with a table of contents is added to the report for easier
navigation. - Graph fonts are made a little bigger. Graphs now respond to zooming in and
out on the web page. - Enable building on ARM platforms such as M1 macintosh and Aarch64.
- Speedup the overrepresented sequences module by adding an AVX2 k-mer
construction algorithm.
version 0.6.0
- Add links to the documentation in the report.
- Moved documentation to readthedocs and added extensive module documentation.
- Change the
-deduplication-estimate-bits
to a more understandable
--duplication-max-stored-fingerprints
. - Add a small table that lists how many reads are >=Q5, >=Q7 etc. in the
per sequence average quality report. - The progressbar can track progress through more file formats.
- The deduplication fingerprint that is used is now configurable from the
command line. - The deduplication module starts by gathering all sequences rather than half
of the sequences. This allows all sequences to be considered using a big
enough hash table.