Releases: mlcommons/algorithmic-efficiency
algoperf-benchmark-0.1.5
Summary
- Finalized variant workload targets.
- Fix in random_utils helper function.
- For conformer PyTorch Dropout layers set
inplace=True
. - Clear CUDA cache at begining of each trial for PyTorch.
What's Changed
- update speech variants target setting points by @priyakasimbeg in #727
- set num_workers for librispeech back to 4 by @priyakasimbeg in #736
- [fix] random_utils.py to
_signed_to_unsigned
by @tfaod in #739 - Fix path in helper config for running experiments in bulk. by @priyakasimbeg in #740
- Finalize variants targets by @priyakasimbeg in #738
- Aiming to Fix Conformer OOM by @pomonam in #710
- Lint fixes by @priyakasimbeg in #742
- Add warning for PyTorch data loader num_workers flag. by @priyakasimbeg in #726
Full Changelog: algoperf-benchmark-0.1.4...algoperf-benchmark-0.1.5
algoperf-benchmark-0.1.4
Upgrade CUDA version to CUDA 12.1:
- Upgrade CUDA version in Dockerfiles that will be used for scoring.
- Update Jax and PyTorch package version tags to use local CUDA installation.
Add flag for completely disabling checkpointing.
- Note that we will run with checkpointing off at scoring time.
Update Deepspeech and Conformer variant target setting configurations.
- Note that variant targets are not final.
Fixed bug in scoring code to take best trial in a study for external-tuning ruleset.
Added instructions for submission.
Changed default number of workers for PyTorch data loaders to 0. Running imagenet workloads with >0 may lead to incorrect eval results see #732.
Update: for speech workloads the pytorch_eval_num_workers
flag to submission_runner.py has to be set to >0, to prevent data loader crash in jax code.
algoperf-benchmark-0.1.3
Update technical documentation.
Bug fixes:
- Fix workload variant names in Dockerfile.
- Fix VIT GLU OOM by reducing batch size.
- Fix submission_runner stopping condition.
- Fix dropout rng in ViT and WMT.
algoperf-benchmark-0.1.2
Add workload variants.
Add prize qualification logs for external tuning ruleset.
Note: FastMRI trials with dropout are not yet added due to #664.
Add functionality to Docker startup script for self_tuning ruleset.
Add self_tuning ruleset option to script that runs all workloads for scoring.
Data setup fixes.
Fix tests that check training differences in PyTorch and JAX on GPU.
algoperf-benchmark-0.1.0
First release of the AlgoPerf: Training algorithms benchmarking code.