Skip to content

Latest commit

 

History

History
122 lines (103 loc) · 14 KB

CHANGELOG.md

File metadata and controls

122 lines (103 loc) · 14 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[1.7.0] - 2024-10-3

  • #254, added the ability to run PepSIRF as a Docker image and added a page for instructions
  • #197, resolved CMake not locating OpenMP on MacOS. Tutorial for fix added to installation page.
  • #236, added a functionality to the "-i" option in Subjoin to accept a regex pattern instead of a filename which contains sample/peptide names. The sample/peptide names used from the score matrix file will be filtered by whether they contain the regex pattern.
  • #234, added "--unmapped-reads-output" option to Demux, which writes all reads that have not been mapped to a sample/peptide to the specified filename.
  • #233, changed Deconv "-t" option to accept a tab demilited file with a column for each TaxID and a column for the score threshold to use for that TaxID. The originally functionality still holds: if a number to included with option, each TaxID will use that score threshold.
  • #227, Demux outputs additional information about the total number of samples, the number of samples containing a given number of replicates, and the number of samples starting with "Sblk_". The replicate information with be written to the file provided with the option "--replicate_info".
  • #223, Added "--exclude" option to subjoin that changes the output data file to contain all of the input samples/peptides except the the ones specified by the user.
  • #221, Demux automatically truncates sequences in the library which are longer the than provided length through the "--seq" option. If a sequence is found to be shorter than the specified length, an error is thrown.
  • #218, Added "--custom_id_name_map_info" option to Deconv which accepts a filename, the key column header, and the value column header in the file to use to link TaxIDs to taxon names. This option should be used instead of "--id_name_map" if the user wishes to define a tab-delimited ID name map.
  • #210, Fixes crash in Link when a species does not have an associated ID. A single warning is logged which informs the user some species have not been considered and where to find a list of those species which should be reviewed.
  • #152, Automated tests have been added and finished to test all recently added features and fixed issues in PepSIRF.
  • #131, Provides more information in Enrich's failed enrichment output. Sample replicates which do not meet either threshold are identified in the output and are marked as either not meeting the minimum or maximum threshold.
  • #56, Alters behavior of Demux when ran in reference independent mode. In ref-independent mode, index toggling is turned off; therefore, if an exact match at the given index is not found, the read is discarded.
  • #2, Adds a system to handle logging PepSIRF's progress when running. A default file name is automatically generated with the module name, current time and date. An option '--logfile' which allows the user to provide a custom name for the log file.
  • #36, Standardizes the order tied species are listed in Deconv output. If species names are provided, then the tied species are sorted by alphabeticall by their names; otherwise, they are sorted by their species ID.

[1.6.0]

  • #169, Added an option for FASTQ - level outputs to be generated by demux. This is done with the flag "-q" followed by a directory path where files will be generated
  • #178, in the case of a sample not having enriched peptides, enrich will now add a space to the empty file. This allows for better compatability with deconv through Qiime2.
  • #137, added an option for enrich to drop replicates with low raw read counts. This is done with the flag "-l" or "--low_raw_reads". If this functionality is invoked, dropped replicates will not be considered in the enrichment process, and the dropped replicates will be reported in the enrichment failure reasons file under "Removed Replicates": each line will contain the replicates removed from a sample.
  • #131, enrich now reports which replicates caused a raw read count threshold failure; and identifies if a replicate failed the maximum or minimum threshold.
  • #161, added a flag to deconv that allows the user to specify what string is expected at the end of each file containing enriched peptides (set to "_enriched.txt" by default). If a file without does not end in the string that was specified, deconv skips over that file.
  • #149, added feature to info that generates a matrix of average counts given replicates. Two new flags must be included in order to use this feature: --rep_names and --get_avgs. --rep_names requires an input file with the names of the replicates that the user wants to generate a matrix of average counts for. --get_avgs requires and output file name where the matrix will be stored.

[1.5.1]

  • #154, altered behavior of enrich to produce blank sample file output for samples that failed enrichment.
  • #168, fixed bug introduced in release 1.5, where amino acid level output is overwritten with peptide level output. This no longer occurs.

[1.5.0]

  • #35, added new feature to demux. If samplenames or index name sets have duplicates in samplelist file, then those duplicates will be output to the terminal.
  • #57, demux now has an additional option for providing a tab-delimited file with 5 ordered columns: 1) index name, which should correspond to a header name in the sample sheet, 2) read name, which should be either "r1" or "r2" to specify whether the index is in "--input_r1" or "--input_r2", 3) index start location (0-based, inclusive), 4) index length and 5) number of mismatched to allow. Note: the last three columns correspond to the info currently provided on the command line with "--f_index" and "--r_index" (or "--index1" and "--index2", with recent changes). With this feature, the demux module can now analyze an arbitrary amount of indexes to be found in r1 or r2 input sequences.
  • #57, demux output diagnostics may now provide more index matches for flexibility with demux changes in #57.
  • #138, demux now automatically removes reference duplicates when running in a reference dependent mode.
  • #105, a check is added that verifys the bins provided to the Z score module. It is no longer possible to run the Z score module with the wrong set of bins.
  • #156, solved memory race condition in demux created during development of this release.
  • #163, solved memory race condition in demux that created incorrect counts.
  • #162, removed threading support on MacOS.

[1.4.0] - 2021-07-09

  • #117, CMakelists has been updated to include a new flag for the CXX flags: '-Xpreprocessor'. This flag is used to make compilation in different environments for cpp easier. This issue arose when pepsirf was attempted to be compiled in 'Big Sur' and failed to compile due to an error with '-fopenmp'.
  • #116, link module had occuring error when protein sequences were not found in the metadata map. This has been changed so the situation is handled and an error is thrown stating a sequence was not found in the metadata file.
  • #114, s_enrich and p_enrich have been merged into a single module 'enrich'. A single and pair of samplenames work with the same behavior as s_ and p_enrich respectively. Additionally, >2 replicates can be analyzed to generate enriched peptides. See help options for an update on the options.
  • #103, enrich (previously two separate modules, s_enrich and p_enrich) module now features an optional flag that outputs in a tsv failed enrichment replicate sets. Any set replicates which failed to generate enriched peptides is listed in this file. Each row contains a column for replicate samplenames and a column containing the reason.

[1.3.7] - 2021-06-28

  • #125, norm module incorrectly stored peptide names with the assumption that in diff, diff-ratio, or ratio the control and original matrix are in the same order. The peptide names should not be assumed to be in the same order. This has been updated by changing the type of container used and the method of access.
  • #104, norm module help message updated. ('-p', '--peptide_scores') option in the norm help message states in the final sentence "This file should be in the same format as the output from the deconv module.". This is incorrect - deconv should be demux - it should read "This file should be in the same format as the output from the demux module.".

[1.3.6] - 2021-06-09

  • #96, demux module now includes a warning when index/barcode names from the samplelist are not included in the fasta file provided by (--index). The warning includes a list of missing names.
  • #97, zscore module now verifies the correct type of file is provided for (--bins). Assuming the incorrect file provided will be a score matrix tsv, the verification process is a check of the second line in the tsv. If a numerical value or 'inf' or 'NaN' is found, then an error is thrown with a message stating to check the file provided.
  • #99, subjoin module bug fixed with names in output score matrix. The subjoin module features the ability to update the sample/peptide names for the output matrix using a second column in the namelist file provided as the second file in the pair. In version 1.3.5, this feature did not work correctly where name updates to existing names in the matrix that themselves would be updated, was prone to error by mixing up the associated score column to the name. eg. an original name "pv1_001" changed to "pv1_101" and and "pv1_001" being used elsewhere in the renaming was prone to switching names around due to stochastic ordering of names in the matrix. This has been fixed by adding an 'upate_labels' method and altered structure of the container storing names from the namelist file. There is also now a check that the namelist file can be opened before continuing the run.
  • #100, demux module diagnostic feature added previously miscalculated counts for index1, pair, and var region matches. This was in part due to conditional checks for pair matches and the presence of barcode/index names from the index file that are not included in the samplelist file. The fix is to remove the unused names from the index file - only the generated container holding them, the file itself is not edited - and a more concise condition to verify a pair match occurred.

[1.3.5] - 2021-03-02

  • Fixed penrich bug with threshold verification for enrichment candidates and raw score pairs.
  • Fixed bug with norm which forced input of id or names for use with negative control.
  • Fixed bug with norm incorrectly accessing matrix elements for diff, ratio, diffratio.

[1.3.4] - 2020-12-28

  • Penrich now uses a tab-delimited file containing a matrix file name and its threshold(s) per each line. A matrix file may contain zscores or normalized counts of each peptide.
  • Demux sample list may now include more than 3 columns for demultiplexing. A header name must now be specified for each column in the list to specify the sample name column, index 1 column and potentially index 2 column. Updated comments to drop reference to forward and reverse indexes - now index 1 and index 2.
  • Norm includes 3 new approaches: diff, ratio, and diff-ratio.

[1.3.3] - 2020-10-16

  • Subjoin bugfix where first line of list of provided matrix filenames may be skipped.
  • Subjoin -f flag has become two separate flags, -i(--input) and -m(--multi_file). This is to increase flexability in providing a variety of input.
  • Subjoin gave warnings when reading the "Sequence name" header as a sample name. This issue was created from 1.3.0 feature addition.

[1.3.2] - 2020-09-20

  • Fixed issue with Zlib compilation error on Mac.
  • Added additional output features to demux to aid in diagnosis and tracing of the 2 possible indexes given and DNA tag matches.
  • Added highest density interval as filtering option for zscore module.

[1.3.1] - 2020-07-09

  • Fixed bug causing sample names to be mismatched for s_enrich module output files.
  • ZLib disabled for Mac OS temporarily to avoid compilation bug.

[1.3.0] - 2020-06-22

  • Deconv module now requires --linked file to be in format provided by link module output file.
  • Deconv Module now uses a single scoring strategy flag that takes the name of the strategy as an argument.
  • Updated help info provided by modules -h flag.
  • Fixed bug where bin module last bin size falls below minimum.
  • Changed link modules scoring strategy selection process to now use only one flag to which the strategy must be entered.
  • Added functionality for metadata file usage in link module over an ID index number.
  • Fixed bug where p_enrich -s arg referred to both the output suffix and the sample.
  • Altered the info module output for sum of column scores to fixed-point notation.
  • Fixed bug causing the max of the specified norm/score to be used in s_enrich
  • Added option to subjoin where exclusion of a name list outputs all columns of a given matrix file.
  • Added error checking and reporting to demux sample list parser and fasta parser.
  • Fixed bug that prevented the 'pepsirf_test' executable from building.

[1.2.2] - 2020-05-01

  • (General) Added instructions for contributing to the PepSIRF software.

[1.2.1] - 2020-04-06

  • (Demux) Fixed a bug that allowed reads who had a forward index match but no reverse index match be output with a score of zero.
  • (Demux) Fixed a bug causing a difference in scores between aggregate and translation-based non-aggregate scores.

[1.2.0] - 2020-03-30

  • Updated the help text of each module to display the current version number.
  • Fixed demux flag names that were inconsistent between the CLI input and standard output.
  • Demux can now read from gzipped fastq files
  • Added a precision argument to norm that enables the specification of numeric precision in the output.
  • Fixed bug caused by an incomplete Codon -> AA translation map

[1.1.0] - 2020-02-26

  • 2020-02-25: [General] Module help information now fills the entire line width on MacOS.
  • 2020-02-25: [Demux] Added support for translation-based count aggregation.
  • 2020-02-24: [Demux] Added support for reference-independent demultiplexing.

[1.0.0] - 2020-02-14

Release of the first version of PepSIRF.