Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.6.0 #429

Merged
merged 168 commits into from
May 10, 2024
Merged

0.6.0 #429

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
c49cee4
bump
Linux-cpp-lisp Dec 19, 2022
585c5bd
`model_dtype` initial
Linux-cpp-lisp Dec 19, 2022
3cb9854
GraphModel
Linux-cpp-lisp Dec 19, 2022
3705fbf
promote dtype in loss
Linux-cpp-lisp Dec 19, 2022
09bb5db
Working `model_dtype`
Linux-cpp-lisp Dec 19, 2022
c1d68a4
Fix equivariance tests
Linux-cpp-lisp Dec 19, 2022
5fa3e49
passing tests
Linux-cpp-lisp Dec 19, 2022
d563759
Use Cholesky for solver
Linux-cpp-lisp Dec 20, 2022
617d3fc
docstring
Linux-cpp-lisp Dec 20, 2022
0657fef
bump
Linux-cpp-lisp Dec 20, 2022
2ce69d4
seed changes
Linux-cpp-lisp Dec 20, 2022
20c2529
better logging default
Linux-cpp-lisp Dec 20, 2022
1c63525
ssp -> silu default
Linux-cpp-lisp Dec 20, 2022
631904c
missing test seed
Linux-cpp-lisp Dec 20, 2022
c281f56
updated tests for float64/float32 mixed
Linux-cpp-lisp Dec 21, 2022
28df610
FMA with version check
Linux-cpp-lisp Dec 21, 2022
2a6d27c
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Dec 21, 2022
15edf66
remove fixed_fields machinery
Linux-cpp-lisp Dec 21, 2022
0d40579
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Dec 21, 2022
50539bc
1.10 compat
Linux-cpp-lisp Dec 21, 2022
dcf77ae
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Dec 21, 2022
86857c3
lint
Linux-cpp-lisp Dec 21, 2022
77b8971
ensure dtype reset if error
Linux-cpp-lisp Dec 21, 2022
9ddd616
fix type promotion in scaling
Linux-cpp-lisp Dec 21, 2022
9b95404
fix to new return format
Linux-cpp-lisp Dec 21, 2022
d1f4da3
make tests more efficient
Linux-cpp-lisp Dec 21, 2022
1a4a3ca
lint
Linux-cpp-lisp Dec 21, 2022
75d9286
cheaper? dtype promotion
Linux-cpp-lisp Dec 21, 2022
aaa061c
Run tests on multiple GPUs when available
Linux-cpp-lisp Dec 21, 2022
9c8a998
multi gpu pytest
Linux-cpp-lisp Dec 21, 2022
075d2b0
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Dec 21, 2022
436e5cf
Fix tests
Linux-cpp-lisp Dec 22, 2022
ed9a328
more info in equivar test failure
Linux-cpp-lisp Dec 22, 2022
5973497
warn on default_dtype=float32
Linux-cpp-lisp Dec 22, 2022
f244deb
get_device() helper
Linux-cpp-lisp Dec 22, 2022
66e9707
graph_model model builders
Linux-cpp-lisp Dec 22, 2022
dd6c7ba
More robust embedding cutoff test
Linux-cpp-lisp Jan 9, 2023
d0f9fc5
adding Tensorboard as logger (#289)
nw13slx Jan 10, 2023
284808c
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Jan 10, 2023
33f6751
partial epochs
Linux-cpp-lisp Jan 11, 2023
34dc79f
match tolerance to dtype
Linux-cpp-lisp Jan 11, 2023
96d6fee
Work with `wandb>=0.13.8`
Linux-cpp-lisp Jan 12, 2023
9713ede
docs
Linux-cpp-lisp Jan 12, 2023
d43a725
handle data type
Linux-cpp-lisp Jan 18, 2023
63099ab
benchmark with explicit CUDA sync too
Linux-cpp-lisp Jan 18, 2023
ee0b15d
remove torch in setup.py
Linux-cpp-lisp Jan 19, 2023
9da6bb0
version bump
Linux-cpp-lisp Jan 19, 2023
922b622
add batch ptr key to avoid .max() calls
Linux-cpp-lisp Jan 23, 2023
3db6752
looping for partialsampler
Linux-cpp-lisp Jan 24, 2023
703142a
update with plugin section
Linux-cpp-lisp Jan 24, 2023
ad2b2f1
fix with graphmodel
Linux-cpp-lisp Jan 24, 2023
032031c
thresholds
Linux-cpp-lisp Feb 2, 2023
fd74bc6
improve NequIP numerics
Linux-cpp-lisp Feb 2, 2023
f695c87
Merge branch 'develop' into model-dtype
Linux-cpp-lisp Feb 2, 2023
75976a0
update changelog
Linux-cpp-lisp Feb 2, 2023
15d1b5e
Merge branch 'main' into develop
Linux-cpp-lisp Feb 2, 2023
9e983cd
lint
Linux-cpp-lisp Feb 2, 2023
8f13ecb
fix trainer test
Linux-cpp-lisp Feb 2, 2023
c651445
remove related_scale/shift_keys
Linux-cpp-lisp Feb 3, 2023
e471c8d
stress units note
Linux-cpp-lisp Feb 3, 2023
580a76a
test PartialSampler
Linux-cpp-lisp Feb 4, 2023
f052903
add force test
Linux-cpp-lisp Feb 4, 2023
0c4baad
improved batch indexing in stress
Linux-cpp-lisp Feb 4, 2023
cdc36b5
test wrapped vs unwrapped consistent
Linux-cpp-lisp Feb 4, 2023
efa9c20
RDF
Linux-cpp-lisp Feb 4, 2023
d569419
remove unnecessary sum
Linux-cpp-lisp Feb 7, 2023
4b870ca
remove weird print
Linux-cpp-lisp Feb 7, 2023
005104d
don't batch / scatter unnecessarily
Linux-cpp-lisp Feb 7, 2023
6b4c0d5
addmm
Linux-cpp-lisp Feb 7, 2023
9cdf325
JIT
Linux-cpp-lisp Feb 7, 2023
b59d8cf
double check for valid autograd graph
Linux-cpp-lisp Feb 7, 2023
289dd13
fix dimensions in special case
Linux-cpp-lisp Feb 7, 2023
f5c19c9
text wrapped more
Linux-cpp-lisp Feb 7, 2023
265fa01
initial pair potentials
Linux-cpp-lisp Feb 8, 2023
abdc6f2
fix tests
Linux-cpp-lisp Feb 8, 2023
20dddf9
Merge branch 'develop' into pair-potential
Linux-cpp-lisp Feb 8, 2023
fcb921f
Use less data nequip-benchmark
Linux-cpp-lisp Feb 10, 2023
721777b
Merge branch 'develop' into pair-potential
Linux-cpp-lisp Feb 10, 2023
9fbee91
warn on override of default dtype
Linux-cpp-lisp Feb 10, 2023
3167afe
record and restore model and default dtype in deployment
Linux-cpp-lisp Feb 10, 2023
98a510f
refactor
Linux-cpp-lisp Feb 10, 2023
8ab8895
fix when cell not present
Linux-cpp-lisp Feb 10, 2023
25c42d7
fix RDF to give self-self RDFs
Linux-cpp-lisp Feb 10, 2023
7e0ecf8
add plotting script
Linux-cpp-lisp Feb 12, 2023
9e0e572
pair
Linux-cpp-lisp Feb 12, 2023
a9866aa
rescale only when there is a scale
Linux-cpp-lisp Feb 13, 2023
e941b4f
Test ZBL against LAMMPS
Linux-cpp-lisp Feb 13, 2023
e36afb5
ensure config
Linux-cpp-lisp Feb 13, 2023
e9c7a8a
less indexing
Linux-cpp-lisp Feb 13, 2023
5a7f328
allow empty tensors past tests
Linux-cpp-lisp Feb 13, 2023
1c056cf
test force smoothness
Linux-cpp-lisp Feb 13, 2023
9a71e15
test ZBL thoroughly
Linux-cpp-lisp Feb 13, 2023
a232c88
Test with pair potential
Linux-cpp-lisp Feb 13, 2023
73e4142
GPU OOM offloading mode (#300)
Linux-cpp-lisp Feb 13, 2023
2dbcd6f
Merge branch 'develop' into pair-potential
Linux-cpp-lisp Feb 13, 2023
f5a19f4
lint
Linux-cpp-lisp Feb 13, 2023
ead9c5d
--output-fields-from-original-dataset
Linux-cpp-lisp Feb 14, 2023
cb3a347
Add parity plot example script
Linux-cpp-lisp Feb 14, 2023
90d7c0c
Warn/error on unused keys (#301)
Linux-cpp-lisp Feb 15, 2023
3808249
fix unused error for LR, early stopping, etc. options
Linux-cpp-lisp Feb 20, 2023
e1ee4c6
remove default run name
Linux-cpp-lisp Feb 20, 2023
29f089a
more aggressive test to compensate for nondet numerics
Linux-cpp-lisp Feb 21, 2023
91e498f
backward compatibility
Linux-cpp-lisp Feb 21, 2023
3ac367b
backwards compat, again
Linux-cpp-lisp Feb 21, 2023
28c0643
fix relaxed atol to be in both checks
Linux-cpp-lisp Feb 23, 2023
6fa97f3
add error
Linux-cpp-lisp Feb 28, 2023
8f3e6f3
StressForceOutput default
Linux-cpp-lisp Feb 28, 2023
373e120
better version parsing
Linux-cpp-lisp Feb 28, 2023
aecf025
global options fuse
Linux-cpp-lisp Feb 28, 2023
57789df
absmax
Linux-cpp-lisp Feb 28, 2023
190a9aa
document absmax
Linux-cpp-lisp Feb 28, 2023
9f99f8e
better statistics N<2 error than nan
Linux-cpp-lisp Mar 1, 2023
d88687d
fix default
Linux-cpp-lisp Mar 1, 2023
539a1a4
document
Linux-cpp-lisp Mar 1, 2023
ff2d2c6
fix on CPU
Linux-cpp-lisp Mar 2, 2023
3a6ae34
fix data count errors
Linux-cpp-lisp Mar 2, 2023
08a35be
fix load_model_state with CUDA to CPU
Linux-cpp-lisp Mar 3, 2023
54805d2
Fix torchscript when no shifts/scales
Linux-cpp-lisp Mar 6, 2023
3d14cbe
Add HDF5 based dataset option (#227)
peastman Mar 17, 2023
93f2112
slightly relax test numerics in float32
Linux-cpp-lisp Mar 19, 2023
a3f7536
fix tests
Linux-cpp-lisp Mar 20, 2023
12d3da9
better message
Linux-cpp-lisp Mar 23, 2023
73b0c6f
fix adjacency test
Linux-cpp-lisp Mar 24, 2023
327a250
allow registered extra metadata
Linux-cpp-lisp Mar 24, 2023
e30ce3e
remove stress warning
Linux-cpp-lisp Mar 24, 2023
0d8a567
fix type converstion for type_to_chemical_symbol
Linux-cpp-lisp Mar 27, 2023
c3ab697
consistency with minimal.yaml
Linux-cpp-lisp Mar 27, 2023
bc162cc
add NEQUIP_ERROR_ON_NO_EDGES
Linux-cpp-lisp Mar 27, 2023
18c37a2
doc
Linux-cpp-lisp Mar 28, 2023
f1e0b74
add freeze option
Linux-cpp-lisp Apr 6, 2023
671c369
EDGE_CUTOFF_KEY
Linux-cpp-lisp Apr 7, 2023
8310052
lint
Linux-cpp-lisp Apr 7, 2023
6967c1b
fix typo
Linux-cpp-lisp Apr 14, 2023
621f57c
typo
Linux-cpp-lisp Apr 16, 2023
a9c96a4
fix lr sched docs
Linux-cpp-lisp Apr 16, 2023
da3c2b7
add logging
Linux-cpp-lisp Apr 21, 2023
26e2645
better error message
Linux-cpp-lisp May 8, 2023
4673bc4
GMM Uncertainty Quantification (#310)
albertzhu01 May 12, 2023
f156438
allow_tf32 default true
Linux-cpp-lisp May 12, 2023
00f4da8
remove _params suffix
Linux-cpp-lisp May 12, 2023
47375ef
docs updates
Linux-cpp-lisp May 12, 2023
9e94e99
style docs
Linux-cpp-lisp May 12, 2023
5bddd74
favicon
Linux-cpp-lisp May 12, 2023
c1d7cd9
docs, again
Linux-cpp-lisp May 12, 2023
d1cddec
add more printed warnings
Linux-cpp-lisp May 25, 2023
15f036d
don't require sklearn for whole package
Linux-cpp-lisp May 31, 2023
dfbce31
warnings on version mismatch
Linux-cpp-lisp Jun 2, 2023
1473cc8
Added `edge_energy` to `ALL_ENERGY_KEYS` subjecting it to global rescale
Linux-cpp-lisp Jun 5, 2023
32bad0c
add simple LJ
Linux-cpp-lisp Jun 22, 2023
2f43aa8
put the right versions in deployed models
Linux-cpp-lisp Jun 22, 2023
0b02c41
No negative volumes in rare cases
Linux-cpp-lisp Jun 28, 2023
3f03c77
set PYTORCH_JIT_USE_NNC_NOT_NVFUSER by default
Linux-cpp-lisp Jul 29, 2023
4aabe9f
Add nequip-deploy build --checkpoint
Linux-cpp-lisp Oct 11, 2023
3fd2213
nequip-deploy --override
Linux-cpp-lisp Oct 11, 2023
2185c7a
more complete memory summary
Linux-cpp-lisp Jan 29, 2024
04d272d
warn on unsupported types in AtomicData
Linux-cpp-lisp Jan 29, 2024
bf54de8
fix type warning
Linux-cpp-lisp Jan 30, 2024
bffd533
add training blowup sanity threshold to example.yaml
Linux-cpp-lisp Jan 30, 2024
e96ccd4
Update `.readthedocs.yaml` (#418)
kavanase Apr 19, 2024
4bf8820
Fix docs dependencies (#420)
kavanase Apr 19, 2024
c310ad6
add matscipy neighborlist option (#1)
cw-tan Apr 30, 2024
9b5b17c
Fix dataset unit rescaling of per-species shifts (#2)
Linux-cpp-lisp Apr 30, 2024
7fcd45d
remove unused
Linux-cpp-lisp May 1, 2024
9ba1d5f
Add SimpleLossSchedule
Linux-cpp-lisp May 1, 2024
03a4b45
Merge branch 'main' into develop
Linux-cpp-lisp May 10, 2024
f2a40fe
Cleanup
Linux-cpp-lisp May 10, 2024
0cc2e31
Bump version CHANGELOG
Linux-cpp-lisp May 10, 2024
ef79965
Update PyTorch version for tests
Linux-cpp-lisp May 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
python-version: '3.x'
- name: Install flake8
run: |
pip install flake8==4.0.1
pip install flake8==7.0.0
- name: run flake8
run: |
flake8 . --count --show-source --statistics
3 changes: 2 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
python-version: [3.9]
torch-version: [1.11.0, 1.12.1]
torch-version: [1.13.1, "2.*"]

steps:
- uses: actions/checkout@v2
Expand All @@ -32,6 +32,7 @@ jobs:
python -m pip install --upgrade pip
pip install setuptools wheel
pip install torch==${TORCH} -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install h5py scikit-learn # install packages that aren't required dependencies but that the tests do need
pip install --upgrade-strategy only-if-needed .
- name: Install pytest
run: |
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/tests_develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
python-version: [3.9]
torch-version: [1.12.1]
torch-version: ["2.*"]

steps:
- uses: actions/checkout@v2
Expand All @@ -32,6 +32,7 @@ jobs:
python -m pip install --upgrade pip
pip install setuptools wheel
pip install torch==${TORCH} -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install h5py scikit-learn # install packages that aren't required dependencies but that the tests do need
pip install --upgrade-strategy only-if-needed .
- name: Install pytest
run: |
Expand Down
20 changes: 20 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file for details

# Required
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.9"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt
46 changes: 46 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,51 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

Most recent change on the bottom.

## Unreleased

## [0.6.0] - 2024-5-10
### Added
- add Tensorboard as logger option
- [Breaking] Refactor overall model logic into `GraphModel` top-level module
- [Breaking] Added `model_dtype`
- `BATCH_PTR_KEY` in `AtomicDataDict`
- `AtomicInMemoryDataset.rdf()` and `examples/rdf.py`
- `type_to_chemical_symbol`
- Pair potential terms
- `nequip-evaluate --output-fields-from-original-dataset`
- Error (or warn) on unused options in YAML that likely indicate typos
- `dataset_*_absmax` statistics option
- `HDF5Dataset` (#227)
- `include_file_as_baseline_config` for simple modifications of existing configs
- `nequip-deploy --using-dataset` to support data-dependent deployment steps
- Support for Gaussian Mixture Model uncertainty quantification (https://doi.org/10.1063/5.0136574)
- `start_of_epoch_callbacks`
- `nequip.train.callbacks.loss_schedule.SimpleLossSchedule` for changing the loss coefficients at specified epochs
- `nequip-deploy build --checkpoint` and `--override` to avoid many largely duplicated YAML files
- matscipy neighborlist support enabled with `NEQUIP_MATSCIPY_NL` environment variable

### Changed
- Always require explicit `seed`
- [Breaking] Set `dataset_seed` to `seed` if it is not explicitly provided
- Don't log as often by default
- [Breaking] Default nonlinearities are `silu` (`e`) and `tanh` (`o`)
- Will not reproduce previous versions' data shuffling order (for all practical purposes this does not matter, the `shuffle` option is unchanged)
- [Breaking] `default_dtype` defaults to `float64` (`model_dtype` default `float32`, `allow_tf32: true` by default--- see https://arxiv.org/abs/2304.10061)
- `nequip-benchmark` now only uses `--n-data` frames to build the model
- [Breaking] By default models now use `StressForceOutput`, not `ForceOutput`
- Added `edge_energy` to `ALL_ENERGY_KEYS` subjecting it to global rescale

### Fixed
- Work with `wandb>=0.13.8`
- Better error for standard deviation with too few data
- `load_model_state` GPU -> CPU
- No negative volumes in rare cases

### Removed
- [Breaking] `fixed_fields` machinery (`npz_fixed_field_keys` is still supported, but through a more straightforward implementation)
- Default run name/WandB project name of `NequIP`, they must now always be provided explicitly
- [Breaking] Removed `_params` as an allowable subconfiguration suffix (i.e. instead of `optimizer_params` now only `optimizer_kwargs` is valid, not both)
- [Breaking] Removed `per_species_rescale_arguments_in_dataset_units`

## [0.5.6] - 2022-12-19
### Added
Expand All @@ -14,6 +59,7 @@ Most recent change on the bottom.
- `nequip-benchmark --no-compile` and `--verbose` and `--memory-summary`
- `nequip-benchmark --pdb` for debugging model (builder) errors
- More information in `nequip-deploy info`
- GPU OOM offloading mode

### Changed
- Minimum e3nn is now 0.4.4
Expand Down
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@ NequIP is an open-source code for building E(3)-equivariant interatomic potentia
NequIP requires:

* Python >= 3.7
* PyTorch >= 1.8, !=1.9, <=1.11.*. PyTorch can be installed following the [instructions from their documentation](https://pytorch.org/get-started/locally/). Note that neither `torchvision` nor `torchaudio`, included in the default install command, are needed for NequIP.
* PyTorch == `1.11.*` or `1.13.*` or later (do **not** use `1.12`). (Some users have observed silent issues with PyTorch 2+, as reported in #311. Please report any similar issues you encounter.) PyTorch can be installed following the [instructions from their documentation](https://pytorch.org/get-started/locally/). Note that neither `torchvision` nor `torchaudio`, included in the default install command, are needed for NequIP.

**You must install PyTorch before installing NequIP, however it is not marked as a dependency of `nequip` to prevent `pip` from trying to overwrite your PyTorch installation.**

To install:

* We use [Weights&Biases](https://wandb.ai) to keep track of experiments. This is not a strict requirement — you can use our package without it — but it may make your life easier. If you want to use it, create an account [here](https://wandb.ai) and install the Python package:
* We use [Weights&Biases](https://wandb.ai) (or TensorBoard) to keep track of experiments. This is not a strict requirement — you can use our package without it — but it may make your life easier. If you want to use it, create an account [here](https://wandb.ai) and install the Python package:

```
pip install wandb
Expand Down Expand Up @@ -130,6 +132,12 @@ pair_coeff * * deployed.pth <NequIP type for LAMMPS type 1> <NequIP type for LAM

For installation instructions, please see the [`pair_nequip` repository](https://github.com/mir-group/pair_nequip).

## Plugins / extending `nequip`

`nequip` is a modular framework and extension packages can provide new model components, architectures, etc. The main extension package(s) currently available are:
- [Allegro](https://github.com/mir-group/allegro): implements the highly parallelizable Allegro model architecture.

Details on writing and using plugins can be found in the [Allegro tutorial](https://colab.research.google.com/drive/1yq2UwnET4loJYg_Fptt9kpklVaZvoHnq) and in [`nequip-example-extension`](https://github.com/mir-group/nequip-example-extension/).

## References & citing

Expand Down
22 changes: 12 additions & 10 deletions configs/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ run_name: example-run-toluene
seed: 123 # model seed
dataset_seed: 456 # data set seed
append: true # set true if a restarted run should append to the previous log file
default_dtype: float32 # type of float to use, e.g. float32 and float64

# see https://arxiv.org/abs/2304.10061 for discussion of numerical precision
default_dtype: float64
model_dtype: float32
allow_tf32: true # consider setting to false if you plan to mix training/inference over any devices that are not NVIDIA Ampere or later

# network
r_max: 4.0 # cutoff radius in length units, here Angstrom, this is an important hyperparamter to scan
Expand Down Expand Up @@ -68,7 +72,7 @@ wandb: true
wandb_project: toluene-example # project name used in wandb

verbose: info # the same as python logging, e.g. warning, info, debug, error; case insensitive
log_batch_freq: 10 # batch frequency, how often to print training errors withinin the same epoch
log_batch_freq: 100 # batch frequency, how often to print training errors withinin the same epoch
log_epoch_freq: 1 # epoch frequency, how often to print
save_checkpoint_freq: -1 # frequency to save the intermediate checkpoint. no saving of intermediate checkpoints when the value is not positive.
save_ema_checkpoint_freq: -1 # frequency to save the intermediate ema checkpoint. no saving of intermediate checkpoints when the value is not positive.
Expand All @@ -95,6 +99,9 @@ early_stopping_patiences:
early_stopping_lower_bounds: # stop early if a metric value is lower than the bound
LR: 1.0e-5

early_stopping_upper_bounds: # stop early if the training appears to have exploded
validation_loss: 1.0e+4

# loss function
loss_coeffs:
forces: 1 # if using PerAtomMSELoss, a default weight of 1:1 on each should work well
Expand Down Expand Up @@ -141,17 +148,12 @@ lr_scheduler_factor: 0.5
# the default is to scale the atomic energy and forces by scaling them by the force standard deviation and to shift the energy by the mean atomic energy
# in certain cases, it can be useful to have a trainable shift/scale and to also have species-dependent shifts/scales for each atom

# whether the shifts and scales are trainable. Defaults to False. Optional
per_species_rescale_shifts_trainable: false
per_species_rescale_scales_trainable: false

# initial atomic energy shift for each species. default to the mean of per atom energy. Optional
# the value can be a constant float value, an array for each species, or a string that defines a statistics over the training dataset
# if numbers are explicitly provided, they must be in the same energy units as the training data
per_species_rescale_shifts: dataset_per_atom_total_energy_mean

# initial atomic energy scale for each species. Optional.
# the value can be a constant float value, an array for each species, or a string
per_species_rescale_scales: dataset_forces_rms

# if explicit numbers are given for the shifts/scales, this parameter must specify whether the given numbers are unitless shifts/scales or are in the units of the dataset. If ``True``, any global rescalings will correctly be applied to the per-species values.
# per_species_rescale_arguments_in_dataset_units: True
# if numbers are explicitly provided, they must be in the same energy units as the training data
per_species_rescale_scales: null
51 changes: 35 additions & 16 deletions configs/full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ run_name: example-run-toluene
seed: 123 # model seed
dataset_seed: 456 # data set seed
append: true # set true if a restarted run should append to the previous log file
default_dtype: float32 # type of float to use, e.g. float32 and float64
allow_tf32: false # whether to use TensorFloat32 if it is available
# device: cuda # which device to use. Default: automatically detected cuda or "cpu"

# see https://arxiv.org/abs/2304.10061 for discussion of numerical precision
default_dtype: float64
model_dtype: float32
allow_tf32: true # consider setting to false if you plan to mix training/inference over any devices that are not NVIDIA Ampere or later

# == network ==

Expand Down Expand Up @@ -161,14 +163,17 @@ wandb: true
wandb_project: toluene-example # project name used in wandb
wandb_watch: false

# # using tensorboard for logging
# tensorboard: true

# see https://docs.wandb.ai/ref/python/watch
# wandb_watch_kwargs:
# log: all
# log_freq: 1
# log_graph: true

verbose: info # the same as python logging, e.g. warning, info, debug, error. case insensitive
log_batch_freq: 1 # batch frequency, how often to print training errors withinin the same epoch
log_batch_freq: 100 # batch frequency, how often to print training errors withinin the same epoch
log_epoch_freq: 1 # epoch frequency, how often to print
save_checkpoint_freq: -1 # frequency to save the intermediate checkpoint. no saving of intermediate checkpoints when the value is not positive.
save_ema_checkpoint_freq: -1 # frequency to save the intermediate ema checkpoint. no saving of intermediate checkpoints when the value is not positive.
Expand Down Expand Up @@ -207,9 +212,9 @@ early_stopping_upper_bounds:

# loss function
loss_coeffs: # different weights to use in a weighted loss functions
forces: 1 # if using PerAtomMSELoss, a default weight of 1:1 on each should work well
forces: 1.0 # if using PerAtomMSELoss, a default weight of 1:1 on each should work well
total_energy:
- 1
- 1.0
- PerAtomMSELoss
# note that the ratio between force and energy loss matters for the training process. One may consider using 1:1 with the PerAtomMSELoss. If the energy loss still significantly dominate the loss function at the initial epochs, tune the energy loss weight lower helps the training a lot.

Expand Down Expand Up @@ -244,6 +249,15 @@ loss_coeffs:
# - L1Loss
# forces: 1.0

# You can schedule changes in the loss coefficients using a callback:
# In the "schedule" key each entry is a two-element list of:
# - the 1-based epoch index at which to start the new loss coefficients
# - the new loss coefficients as a dict
#
# start_of_epoch_callbacks:
# - !!python/object:nequip.train.callbacks.loss_schedule.SimpleLossSchedule {"schedule": [[2, {"forces": 0.0, "total_energy": 1.0}]]}
#

# output metrics
metrics_components:
- - forces # key
Expand Down Expand Up @@ -282,8 +296,9 @@ optimizer_weight_decay: 0
# setting to inf or null disables it
max_gradient_norm: null

# lr scheduler, currently only supports the two options listed below, if you need more please file an issue
# lr scheduler
# first: on-plateau, reduce lr by factory of lr_scheduler_factor if metrics_key hasn't improved for lr_scheduler_patience epoch
# you can also set other options of the underlying PyTorch scheduler, for example lr_scheduler_threshold
lr_scheduler_name: ReduceLROnPlateau
lr_scheduler_patience: 100
lr_scheduler_factor: 0.5
Expand All @@ -304,35 +319,42 @@ per_species_rescale_scales_trainable: false
# whether the scales are trainable. Defaults to False. Optional
per_species_rescale_shifts_trainable: false
# whether the shifts are trainable. Defaults to False. Optional

per_species_rescale_shifts: dataset_per_atom_total_energy_mean
# initial atomic energy shift for each species. default to the mean of per atom energy. Optional
# the value can be a constant float value, an array for each species, or a string
# if numbers are explicitly provided, they must be in the same energy units as the training data
# string option include:
# * "dataset_per_atom_total_energy_mean", which computes the per atom average
# * "dataset_per_species_total_energy_mean", which automatically compute the per atom energy mean using a GP model
per_species_rescale_scales: dataset_forces_rms

per_species_rescale_scales: null
# initial atomic energy scale for each species. Optional.
# the value can be a constant float value, an array for each species, or a string
# if numbers are explicitly provided, they must be in the same energy units as the training data
# string option include:
# * "dataset_forces_absmax", which computes the dataset maxmimum force component magnitude
# * "dataset_per_atom_total_energy_std", which computes the per atom energy std
# * "dataset_per_species_total_energy_std", which uses the GP model uncertainty
# * "dataset_per_species_forces_rms", which compute the force rms for each species
# If not provided, defaults to dataset_per_species_force_rms or dataset_per_atom_total_energy_std, depending on whether forces are being trained.
# If not provided, defaults to null.

# per_species_rescale_kwargs:
# total_energy:
# alpha: 0.001
# max_iteration: 20
# stride: 100
# keywords for ridge regression decomposition of per specie energy. Optional. Defaults to 0.001. The value should be in the range of 1e-3 to 1e-2
# per_species_rescale_arguments_in_dataset_units: True
# if explicit numbers are given for the shifts/scales, this parameter must specify whether the given numbers are unitless shifts/scales or are in the units of the dataset. If ``True``, any global rescalings will correctly be applied to the per-species values.
# keywords for ridge regression decomposition of per species energy. Optional. Defaults to 0.001. The value should be in the range of 1e-3 to 1e-2

# global energy shift and scale
# When "dataset_total_energy_mean", the mean energy of the dataset. When None, disables the global shift. When a number, used directly.
# Warning: if this value is not None, the model is no longer size extensive
global_rescale_shift: null

# global energy scale. When "dataset_force_rms", the RMS of force components in the dataset. When "dataset_total_energy_std", the stdev of energies in the dataset. When null, disables the global scale. When a number, used directly.
# global energy scale. When "dataset_force_rms", the RMS of force components in the dataset.
# When "dataset_forces_absmax", the maximum force component magnitude in the dataset.
# When "dataset_total_energy_std", the stdev of energies in the dataset.
# When null, disables the global scale. When a number, used directly.
# If not provided, defaults to either dataset_force_rms or dataset_total_energy_std, depending on whether forces are being trained.
global_rescale_scale: dataset_forces_rms

Expand Down Expand Up @@ -361,6 +383,3 @@ global_rescale_scale_trainable: false
# per_species_rescale_shifts: null
# per_species_rescale_scales: null

# Options for e3nn's set_optimization_defaults. A dict:
# e3nn_optimization_defaults:
# explicit_backward: True
Loading
Loading