Per-read statistics and general optimization
This release includes new features for investigation of per-read, per-base modified base detection. Study of per-read statistic distributions has improved modified base detection in validation data sets by choosing better default per-read statistics thresholds. This version extends the use of the dampened-fraction of modified bases to better handle samples with variable coverage.
The release also includes some fixes for issues in the last version. The major user issues addressed are:
-
More efficient processing of large genomes, which previously resulted in very large memory usage
- This addresses both computationally and in memory usage issues in the re-squiggle and test_significance commands.
-
Addressing issues specific to RNA processing: truncation of long transcript names and samples mapping to different sets of sequence records/transcripts
-
Better protection of read file corruption resulting from access by multiple, independent, concurrent Tombo commands