Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyOpenSci Initial Review Comments Fixup #147

Merged
merged 12 commits into from
Jul 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/gen_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,13 @@ jobs:
rm docs/README.rst || true
rm sklearn_to_astartes.rst || true
rm docs/sklearn_to_astartes.rst || true
rm CONTRIBUTING.rst || true
rm docs/CONTRIBUTING.rst || true
m2r README.md
m2r CONTRIBUTING.md
m2r sklearn_to_astartes.md
mv README.rst docs
mv README.rst
mv CONTRIBUTING.rst docs
mv sklearn_to_astartes.rst docs
cd docs
rm *.html *.doctree || true
Expand Down
46 changes: 39 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,40 @@
## Contributing & Developer Notes
Pull Requests, Bug Reports, and all Contributions are welcome! Please use the appropriate issue or pull request template when making a contribution.
Pull Requests, Bug Reports, and all Contributions are welcome, encouraged, and appreciated!
Please use the appropriate [issue](https://github.com/JacksonBurns/astartes/issues/new/choose) or [pull request](https://github.com/JacksonBurns/astartes/compare) template when making a contribution to help the maintainers get it merged quickly.

We make use of [the GitHub Discussions page](https://github.com/JacksonBurns/astartes/discussions) to go over potential features to add. Please feel free to stop by if you are looking for something to develop or have an idea for a useful feature!
We make use of [the GitHub Discussions page](https://github.com/JacksonBurns/astartes/discussions) to go over potential features to add.
Please feel free to stop by if you are looking for something to develop or have an idea for a useful feature!

When submitting a PR, please mark your PR with the "PR Ready for Review" label when you are finished making changes so that the GitHub actions bots can work their magic!

### Developer Install

To contribute to the `astartes` source code, start by cloning the repository (i.e. `git clone [email protected]:JacksonBurns/astartes.git`) and then inside the repository run `pip install -e .[molecules,dev]`. This will set you up with all the required dependencies to run `astartes` and conform to our formatting standards (`black` and `isort`), which you can configure to run automatically in vscode [like this](https://marcobelo.medium.com/setting-up-python-black-on-visual-studio-code-5318eba4cd00#:~:text=Go%20to%20settings%20in%20your,%E2%80%9D%20and%20select%20%E2%80%9Cblack%E2%80%9D.).
To contribute to the `astartes` source code, start by forking and then cloning the repository (i.e. `git clone [email protected]:YourUsername/astartes.git`) and then inside the repository run `pip install -e .[dev]`. This will set you up with all the required dependencies to run `astartes` and conform to our formatting standards (`black` and `isort`), which you can configure to run automatically in VSCode [like this](https://marcobelo.medium.com/setting-up-python-black-on-visual-studio-code-5318eba4cd00#:~:text=Go%20to%20settings%20in%20your,%E2%80%9D%20and%20select%20%E2%80%9Cblack%E2%80%9D.).

> **Note**
> Windows Powershell and MacOS Catalina or newer may complain about square brackets, so you will need to double quote the `molecules` command (i.e. `pip install "astartes[molecules,dev]"`)
> Windows Powershell and MacOS Catalina or newer may complain about square brackets, so you will need to double quote the `molecules` command (i.e. `pip install "astartes[dev]"`)

### Unit Testing
All of the tests in `astartes` are written using the built-in python `unittest` module (to allow running without `pytest`) but we _highly_ recommend using `pytest`. To execute the tests from the `astartes` repository, simply type `pytest` after running the developer install (or alternately, `pytest -v` for a more helpful output).
### Version Checking

`astartes` uses `pyproject.toml` to specify all metadata _except_ the version, which is specified in `astartes/__init__.py` (via `__version__`) for backwards compatibility with Python 3.7.
To check which version of `astartes` you have installed, you can run `python -c "import astartes; print(astartes.__version__)"` on Python 3.7 or `python -c "from importlib.metadata import version; version('astartes')" on Python 3.8 or newer.

### Testing
All of the tests in `astartes` are written using the built-in python `unittest` module (to allow running without `pytest`) but we _highly_ recommend using `pytest`.
To execute the tests from the `astartes` repository, simply type `pytest` after running the developer install (or alternately, `pytest -v` for a more helpful output).
On GitHub, we use actions to run the tests on every Pull Request and on a nightly basis (look in `.github/workflows` for more information).
These tests include unit tests, functional tests, and regression tests.

### Adding New Samplers
Adding a new sampler should extend the `abstract_sampler.py` abstract base class.
Each subclass should override the `_sample` method with its own algorithm for data partitioning, and the constructor (`__init__.py`) perform any data validation.

All samplers in `astartes` are classified as one of two types: extrapolative or interpolative.
Extrapolative samplers work by clustering data into groups (which are then partitioned into train/validation/test to enforce extrapolation) whereas interpolative samplers provide an exact _order_ in which samples should be moved into the training set.

It can be as simple as a passthrough to a another `train_test_split`, or it can be an original implementation that results in X and y being split into two lists. Take a look at `astartes/samplers/random_split.py` for a basic example!
When actually implemented, this means that extrapolative samplers should set the `self._samples_clusters` attribute and interpolative samplers should set the `self._samples_idxs` attribute.

New samplers can be as simple as a passthrough to another `train_test_split`, or it can be an original implementation that results in X and y being split into two lists. Take a look at `astartes/samplers/interpolation/random_split.py` for a basic example!

After the sampler has been implemented, add it to `__init__.py` in in `astartes/samplers` and it will automatically be unit tested. Additional unit tests to verify that hyperparameters can be properly passed, etc. are also recommended.

Expand Down Expand Up @@ -71,6 +87,22 @@ If possible, we would like to also add an example Jupyter Notebook with any new

Contact [@JacksonBurns](https://github.com/JacksonBurns) if you need assistance adding an existing workflow to `astartes`. If this featurization scheme requires additional dependencies to function, we may add it as an additional _extra_ package in the same way that `molecules` in installed.

### The `train_val_test_split` Function
`train_val_test_split` is the workhorse function of `astartes`.
It is responsible for instantiating the sampling algorithm, partitioning the data into training, validation, and testing, and then returning the requested results while also keeping an eye on data types.
Under the hood, `train_test_split` is just calling `train_val_test_split` with `val_size` set to `0.0`.
For more information on how it works, check out the inline documentation in `astartes/main.py`.

### Development Philosophy

The developers of `astartes` prioritize (1) reproducibility, (2) flexibility, and (3) maintainability.
1. All versions of `astartes` `1.x` should produce the same results across all platforms, so we have thorough unit and regression testing run on a continuous basis.
2. We specify as _few dependencies as possible_ with the _loosest possible_ dependency requirements, which allows integrating `astartes` with other tools more easily.
- Dependencies which introduce a lot of requirements and/or specific versions of requirements are shuffled into the `extras_require` to avoid weighing down the main package.
- Compatibility with all versions of modern Python is achieved by not tightly specifying version numbers as well as by regression testing across all versions.
3. We follow DRY (Don't Repeat Yourself) principles to avoid code duplication and decrease maintainence burden, have near-perfect test coverage, and enforce consistent formatting style in the source code.
- Inline comments are _critical_ for maintainability - at the time of writing, `astartes` has 1 comment line for every 2 lines of source code.

## JOSS Branch

`astartes` corresponding JOSS paper is stored in this repository on a separate branch. You can find `paper.md` on the aptly named `joss-paper` branch.
Expand Down
Loading