Skip to content

Commit

Permalink
[WIP] add migration docs and release notes (#1316)
Browse files Browse the repository at this point in the history
* add migration docs and release notes

* Update doc/support.md

Co-authored-by: Taylor Reiter <[email protected]>

* Update doc/support.md

Co-authored-by: Taylor Reiter <[email protected]>

* Update doc/release-notes/sourmash-4.0.md

Co-authored-by: Taylor Reiter <[email protected]>

* Update doc/release-notes/sourmash-4.0.md

Co-authored-by: Taylor Reiter <[email protected]>

* Update doc/release-notes/sourmash-4.0.md

Co-authored-by: Taylor Reiter <[email protected]>

* update with last set of changes

* add missing line break

Co-authored-by: Taylor Reiter <[email protected]>
  • Loading branch information
ctb and taylorreiter authored Feb 9, 2021
1 parent dda99fe commit 219e606
Show file tree
Hide file tree
Showing 3 changed files with 132 additions and 12 deletions.
1 change: 1 addition & 0 deletions doc/release-notes/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ for detailed release notes for each version!
```{toctree}
:maxdepth: 2
sourmash-4.0
sourmash-3.0
sourmash-2.0
```
71 changes: 71 additions & 0 deletions doc/release-notes/sourmash-4.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# sourmash v4.0 release notes

```{contents}
:depth: 2
```

We are pleased to announce release 4.0 of sourmash! This release
contains many feature improvements and new functionality, as well as
many breaking changes with sourmash 2.x and 3.x.

Please see
[our migration guide](../support.md#migrating-from-sourmash-v3-x-to-sourmash-v4-x)
for guidance on updating to sourmash v4, and post questions about
migrating to sourmash 4.0 in the
[sourmash issue tracker](https://github.com/dib-lab/sourmash/issues/new).

## Major changes for 4.0

### New or changed behavior

* default SBT storage is now .sbt.zip (#1174, #1170)
* add `sourmash sketch` command for creating signatures (#1159)
* protein ksizes in MinHash are now divided by 3, except in `sourmash compute` (#1277)
* refactor MinHash API and implementation: add, iadd, merge, hashes, and max_hash (#1282, #1154, #1139, #1301)
* add HyperLogLog implementation (#1223)
* `SourmashSignature.name` is now a property (not a method): use `str(sig)` instead of `name()` (#1179, #1232)
* `lca summarize` no longer merges all signatures, and uses hash abundance by default (#1175)
* `index `and `lca index` (#1186, #1222) now support `--from-file` and no longer require signature files on command line
* `--traverse-directory` is now on by default for signature loading behavior (#1178)

### Feature removal

* remove Python 2.7 support (& end Python 2 compatibility) (#1145, #1144)
* remove `lca gather` (#1307)
* remove 10x support from `sourmash compute` (#1229)
* remove `dump` command (#1157)

### Feature/function deprecations
* deprecate `sourmash compute` (#1159)
* deprecate `load_signatures`, `sourmash.load_one_signature`, `create_sbt_index`, and `load_sbt_index` (#1279, #1304)
* deprecate `import_csv` in favor of new `sourmash sig import --csv` (#1281)

## Refactoring, improvements, and minor bug fixes:

* accept file list in `sourmash sig cat` (#1236)
* add unique_intersect_bp and gather_result_rank to gather CSV output (#1219)
* remove deprecated minhash functions (#1149)
* fix Rust panic error in signature creation (#1172)
* cache nodes in SBT during search (#1161)
* fix two bugs in gather `--output-unassigned` (#1156)

## Documentation updates

* add information about versioning, migrations, etc to the docs (#1153)
* @CTB MORE GOES HERE

## Infrastructure and CI changes:

* update finch requirement from 0.3.0 to 0.4.1 (#1290)
* update rand for test, and activate "js" feature for getrandom (#1275)
* dev updates (configs and doc) (#1298)
* move wheel building from Travis to GitHub Actions (#1295)
* fix new clippy warnings from Rust 1.49 (#1267)
* use tox for running tests locally (#696)
* CI: small build fixes (#1252)
* CI: Fix releases in GitHub Actions (#1250)
* update build_wheel action paths
* CI: moving python tests from travis to GH actions (#1249)
* CI: move wheel building to GitHub actions (#1244)
* remove last .rst file from docs (#1185)
* update CI for latest branch name change (#1150)
72 changes: 60 additions & 12 deletions doc/support.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Support, Versioning, and Migration

```{contents}
:depth: 2
```

## Asking questions and filing bugs

We do our best to support sourmash users! Users have found important
Expand Down Expand Up @@ -82,26 +86,70 @@ sourmash v3.x supports Python 2.7 as well as Python 3.x, through Python 3.8.

sourmash v4.0 dropped support for versions of Python before Python 3.7,
and our intent is that it will support as-yet unreleased versions of Python 3.x
(e.g. 3.9) moving forward.
(e.g. 3.10) moving forward.

For future versions of sourmash, we plan to follow the
[Numpy NEP 29](https://numpy.org/neps/nep-0029-deprecation_policy.html)
proposal for Python version support. For example, this
would mean that we would drop support for Python 3.7 on December 26,
2021.

## Migrating from sourmash v3.x to sourmash 4.x.
## Migrating from sourmash v3.x to sourmash v4.x.

Our intent is to provide a clear path for migration between versions for our users. We rely on *semantic versioning* and deprecation warnings to do this -
* Within each major version release (v2, v3, v4), the command-line interface and Python APIs should remain the same, with features being only *added*.
* Across major versions (e.g. v2 to v3, and v3 to v4) we provide warnings when functionality will change in the next major version.

So: if you want to upgrade workflows and scripts from prior releases of sourmash to sourmash v4.0, we suggest doing this in two stages.

First, upgrade to the latest version of sourmash 3.5.x (currently [v3.5.0](https://github.com/dib-lab/sourmash/releases/tag/v3.5.0)), which is compatible with all files and command lines used in previous versions of sourmash (v2.x and v3.x). After upgrading to 3.5.x, scan the sourmash output for deprecation warnings and fix those.

Next, upgrade to the latest version of 4.x, which will introduce some backwards incompatibilities based upon the deprecation warnings.

The major changes are detailed below; please see the [full release notes for 4.0](release-notes/sourmash-4.0.md) for all the details and links to the code changes.

### Sourmash command line

If you use sourmash from the command line, there are a few major changes in 4.0 that you should know about.

First, **`sourmash compute` is deprecated in favor of [`sourmash sketch`](sourmash-sketch.md)**, which provides quite a bit more flexibility in creating signatures.

Second, **`sourmash index` will now save databases in the Zip format (`.sbt.zip`) instead of the old JSON+subdirectory format** (see [updated docs](command-line.md#sourmash-index-build-an-sbt-index-of-signatures)). You can revert to the old behavior by explicitly specifying the `.sbt.json` filename for output when running `sourmash index`.

Third, all sourmash commands that operate on signatures should now be able to directly read from lists of signatures in signature files, SBT databases, LCA databases, directories, and files containing lists of filenames (see [updated docs](command-line.md#advanced-command-line-usage)).

Fourth, if you use `sourmash lca` commands, **`sourmash lca gather` has been removed**. In addition, there are some **changes in how `summarize` works**: it now uses abundances by default, and no longer combines all signatures before summarizing. Specify `--ignore-abundance` and combine your signatures using `sourmash sig merge` to recover the old behavior. Note also that `lca summarize` now includes a new column, `filename`, in the CSV output.

Finally, **k-mer sizes have changed for amino acid sequences** in v4. If you use protein, Dayhoff, or HP signatures, we now interpret k-mer sizes differently on the command line. Briefly, k-mer sizes for protein/dayhoff/hp signatures are now the size of the k-mer in amino acid space, *not* the space of the k-mer in DNA space (as previously used). In practice this means that you need to divide all your old k-mer sizes by 3 when working with k-mers in amino acid space!

Note also that while `sourmash compute` still behaves the same way in v4.x as it did in sourmash 3.5.x, `sourmash sketch translate` and `sourmash sketch protein` both use the *new* approach to amino acid k-mer sizes, as do all of the the command line options for searching, manipulation, and display. Again, in practice this means that you need to divide all your old k-mer sizes by 3 if they apply to amino acid k-mers.

There are several minor changes where error messages should occur appropriately:
* `--traverse-directory` is no longer needed on the command line for `sourmash index` or other functions; directory traversal happens automatically.
* the command lines for `sourmash index` and `sourmash lca index` no longer require signature files to be specified, which can break existing command lines. To fix this, reorder arguments so that any signatures are specified at the end of the command line.

### Python API

First, all k-mer sizes for `protein`, `dayhoff`, and `hp` signatures have changed in the Python layer to be "correct", i.e., to be the size of the protein k-mer. Previously they were 3\*k, i.e. based on the size of the DNA k-mer from which the protein sequence would have been created.

Second, the `MinHash` class API has changed significantly!
* `get_mins()` has been deprecated in favor of `.hashes`, which is a dictionary that contains abundances.
* `merge` now just modifies `MinHash` objects in-place, and no longer returns the merged object; use `__iadd__` (`+=`) for the old behavior, or `__add__` (`+`) to create a new merged object.
* `max_hash` has been deprecated in favor of `scaled`.
* instead of `downsample_scaled(s)` use `downsample(scaled=s)`
* instead of `downsample_n(m)` use `downsample(num=m)`
* `is_molecule_type` has been replaced with a property, `moltype` -- instead of `is_molecule_type(t)` use `moltype == t`.


Prior to the release of sourmash v4, we are adding deprecation
warnings and/or future warnings to all APIs and modules in sourmash
v3.x that are being removed in v4.0. If you are using the Python API,
we suggest you use the following procedure to migrate:
Third, `SourmashSignature` objects no longer have a `name()` method but instead a `name` property, which can be assigned to. This property is now `None` when no name has been assigned. Note that `str(sig)` should now be used to retrieve a display name, and should replace all previous uses of `sig.name()`.

* first, install the latest version of sourmash v3, which should be v3.5.0 or later.
* then, turn on `DeprecationWarning`s in your code per [the warnings module documentation](https://docs.python.org/3/library/warnings.html#overriding-the-default-filter).
* now, run python with the argument `-W error` to turn warnings into errors.
* fix all errors!
* finally, upgrade to sourmash v4.0.
Fourth, a few top-level functions have been deprecated: `load_signatures(...)`, `load_one_signature(...)`, `create_sbt_index(...)`, and `load_sbt_index(...)`.
* `load_signatures(...)`, `load_one_signature(...)` should be replaced with `load_file_as_signatures(...)`. Note there is currently no top-level way to load signatures from strings. For now, if you need that functionality, you can use `sourmash.signature.load_signatures(...)` and `sourmash.signature.load_one_signature(...)`, but please be aware that these are not considered part of the public API that is under semantic versioning, so they may change in the next minor point release; this is tracked in https://github.com/dib-lab/sourmash/issues/1312.
* `load_sbt_index(...)` have been deprecated. Please use `load_file_as_index(...)` instead.
* `create_sbt_index(...)` has been deprecated. There is currently no replacement, although you can use it directly from `sourmash.sbtmh` if necessary.

@CTB add stuff here
Fifth, directory traversal now happens by default when loading signatures, so remove `traverse=True` arguments to several functions in `sourmash_args` - `load_dbs_and_sigs`, `load_file_as_index`, `and load_file_as_signatures`.

Please post questions and concerns to the
[sourmash issue tracker](https://github.com/dib-lab/sourmash/issues)
and we'll be happy to help!

0 comments on commit 219e606

Please sign in to comment.