Update pascal voc12 example #1125

vfdev-5 · 2020-06-12T22:10:53Z

Description:

Updated pascal voc12 example to idist API
refactored training files
- single training
- add exp_tracking module to dispatch calls to specific implementations: mlflow, plx, trains
- updated docs/notes

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

sdesrozis

Some comments but it looks good 😉

sdesrozis · 2020-06-13T07:56:39Z

examples/references/segmentation/pascal_voc2012/README.md

+Training script contains `run` method required by [py_config_runner](https://github.com/vfdev-5/py_config_runner) to 
+run a script with a configuration. 
+
+The split between training script and configuration python file is the following. 


I wonder if gin-config or hydra could not help to compose cfg script and run one.

I let you explore this option :)

sdesrozis · 2020-06-13T07:58:13Z

examples/references/segmentation/pascal_voc2012/README.md

+- other parameters: device, number of epochs, etc
+
+Training script uses these components to setup and run training and validation loops. By default, 
+processing group with "nccl" backend is initialized for distributed configuration (even for a single GPU).


Maybe we could think about full cpu training on a cluster.

IMO very limited interest of that. Would it be one of your use-cases ?

That does not matter for now 😉

examples/references/segmentation/pascal_voc2012/experiments/mlflow/conda.yaml

examples/references/segmentation/pascal_voc2012/configs/train/baseline_resnet101.py

examples/references/segmentation/pascal_voc2012/configs/train/baseline_resnet101_sbd.py

* Added Windows/MacOSX CI for py3.7 only (#1113) * [WIP] Added Windows CI for py2.7 only * Excluded examples from windows ci * Update unittests.yml * Update unittests.yml * Fixed shell bash as suggested * Fixed failing tests on Win32 - added MNIST test for Win32 in Github actions - added tests on macosx in Github actions * Fixed isort * Fixes tests with IterableDataset * Skipped slow deterministic tests on win32 * skip failing timer tests on macos * fix macos platform name * fix _test_setup_logging * skip frequency tests on win platform * skip time tests on macos * fix flake8 * fix isort * Skip distrib tests for Win32 * skip time test for macos * Updated github actions yaml * skip modules for macos * Fixed bad skip of deterministic tests, reduced time for slow tests * Do not run dist tests on macosx Co-authored-by: Sylvain Desroziers <[email protected]> * [FR] Parallel helper tools (#1014) (#1116) * [FR] Parallel helper tools (#1014) * [WIP] auto and parallel dist modules * [WIP] auto optim * Added xla optimizer wrapper - other code updates * Updated auto and cifar10 example * - Fixed resume from - other cosmetics * Fixed bug with _XLADistributedOptimizer - updated default LR * autopep8 fix * Updated README and minor fixes * autopep8 fix * - Removed mnist distributed example - Reverted unintended modifications * Tests of auto methods * autopep8 fix * Tests, docs and code updates * autopep8 fix * Up code, test, cifar10 example and docs * Added option to stop the training - updated ci * Updated readme and fixed ci configs * - Updated code, README and remove old cifar10 Co-authored-by: vfdev-5 <[email protected]> Co-authored-by: AutoPEP8 <> * Fixes failing tests * Minor updates * Other minor updates * Example readme update and minor fixes * Added test on load_objects ddp to improve coverage * Added more tests for parallel launcher * Replaced pbars by logger * Updated link to cifar10 example * Fixes codecov upload * Updated coverage report type for gpu/tpu Co-authored-by: Sylvain Desroziers <[email protected]> * Fixes #1120 (#1122) * Fixes #1120 - Aligned idist args and method names to torch.distributed.launch * replace missing num_procs_per_node * black format * fix bug * replace num_nodes by nnodes Co-authored-by: Desroziers <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> * reverse order of remove/save in Checkpoint handling (#1117) * reverse order of remove/save so there is never an n+1 checkpoint situation. * evict if new item is better than candidate for eviction. * swap order of updating saved and saving to ensure consistency of state * remove redundant method. Co-authored-by: vfdev <[email protected]> * Fix test auto tpu (#1126) * Fixed failing tpu tests * Updated docstring of cifar10 example * Auto pin_memory (#1129) * Auto pin_memory * autopep8 fix Co-authored-by: AutoPEP8 <> * fix auto pin_memory : idist.device().type should be used (#1131) * fix auto pin_memory : idist.device().type should be used * fix cuda in device * fix test * use idist.device().type to test * add missing () Co-authored-by: Desroziers <[email protected]> * Update pascal voc12 example (#1125) * [WIP][Pascal-VOC12] Update/refactor example * [WIP][Pascal-VOC12] Update/refactor example 2 * [WIP] Updated mlflow files * Removed unused files * Fixed flake and black * Removed unused import and fixed version for mlflow Co-authored-by: Sylvain Desroziers <[email protected]> * fix cifar10 model : num_classes missing (#1134) Co-authored-by: Desroziers <[email protected]> * Accuracy MultiLabel Handling and Error Message (#1132) * Updated check for multilabel and error message * Updated docstring and error message * Updated error message formatting Co-authored-by: Sylvain Desroziers <[email protected]> * Updated ImageNet example (#1138) * [WIP] Updated ImageNet example - minor fixes for Pascal VOC12 * Fixed flake8 * Updated pytorch-version-tests.yml to run cron every day at 00:00 UTC (#1141) Co-authored-by: Sylvain Desroziers <[email protected]> * Added check_compute_fn argument to EpochMetric and related metrics (#1140) * Added check_compute_fn argument to EpochMetric and related functions. * Updated docstrings * Added check_compute_fn to _BaseRegressionEpoch * Adding typing hints for check_compute_fn * Update roc_auc.py Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Docs cosmetics (#1142) * Updated docs, replaced single quote by double quote if is code - fixed missing link to Engine - cosmetics * More doc updates * More updates * Fix batch size calculation error (#1137) * Fix batch size calculation error * Add tests for fixed batch size calculation * Fix tests * Test for num_workers * Fix nproc comparison * Improve docs * Fixed docstring Co-authored-by: vfdev <[email protected]> * Docs updates (#1139) * [WIP] Added teaser gif * [WIP] Updated README * [WIP] Updated README * [WIP] Updated docs * Reverted unintended pyproject.toml edits * Updated README and examples parts * More updates of README * Added badge to check pytorch/python compatible versions * Updated README * Added ref to blog "Using Optuna to Optimize PyTorch Ignite Hyperparameters" * Update README.md * Fixed bad internal link in examples * Updated README * Fixes docs (#1147) * Fixed bad link on teaser * Added manual_seed into docs * Issue #1115 : pbar persists due to specific rule in tqdm (notebook) when n < total (#1145) * Issue #1115 pbar persists in notebook due to specific rules when n < total * close pbar doesn't rise danger bar * fix when pbar.total is None Co-authored-by: vfdev <[email protected]> Co-authored-by: Desroziers <[email protected]> * Updated codebase such that torch>=1.3 (#1150) Co-authored-by: vfdev <[email protected]> * add wandb (#1152) wandb integration already exists, just adding it to the requirements file * Fixed typo and missing part of "Where to go next" (#1151) * Fixes #1153 (#1154) - temporary downgrade of scipy to 1.4.1 instead of 1.5.0 * Use global_step as priority, if it exists (#1155) * Use global_step as priority, if it exists * Fix flake8 error * Style fix Co-authored-by: vfdev <[email protected]> * Fix TrainsSaver handling of Checkpoint's n_saved (#1135) * Utilize Trains framework callbacks to better support checkpoint saving and respect Checkpoint.n_saved * Update trains callbacks to new format * autopep8 fix * Fix trains mnist example (store checkpoints in local folder) * Use trains 0.15.1rc0 until PR is approved * Use CallbackType for Trains callback type resolution. Add unit test for Trains callbacks * Update trains version * Updated test_trains_saver_callbacks Co-authored-by: jkhenning <> Co-authored-by: vfdev <[email protected]> * Stateful handlers (#1156) * Stateful handlers * Added state_dict/load_state_dict tests for Checkpoint * integration test * Updated docstring and added include_self to ModelCheckpoint * An integreation test for checkpointing with stateful handlers * Black and flake8 Co-authored-by: vfdev-5 <[email protected]> * Bump version to 0.4rc.0.post1 * bump version to v0.4.0 🎉 Co-authored-by: vfdev <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: Desroziers <[email protected]> Co-authored-by: Marijan Smetko <[email protected]> Co-authored-by: Anmol Joshi <[email protected]> Co-authored-by: Lavanya Shukla <[email protected]> Co-authored-by: Akihiro Matsukawa <[email protected]> Co-authored-by: Jake Henning <[email protected]>

vfdev-5 added 2 commits June 12, 2020 17:05

[WIP][Pascal-VOC12] Update/refactor example

03950a5

[WIP][Pascal-VOC12] Update/refactor example 2

742a95d

vfdev-5 marked this pull request as draft June 12, 2020 22:11

vfdev-5 force-pushed the update/pascal-voc12 branch from 4213a01 to 29769fe Compare June 12, 2020 22:13

[WIP] Updated mlflow files

3ee065c

vfdev-5 force-pushed the update/pascal-voc12 branch from 29769fe to 3ee065c Compare June 12, 2020 22:16

vfdev-5 added 2 commits June 13, 2020 00:18

Merge branch 'master' into update/pascal-voc12

3dd9a40

Removed unused files

bb9f838

vfdev-5 marked this pull request as ready for review June 12, 2020 22:27

vfdev-5 requested a review from sdesrozis June 12, 2020 22:28

Fixed flake and black

39ff8da

vfdev-5 force-pushed the update/pascal-voc12 branch from 52b28d9 to 39ff8da Compare June 12, 2020 22:58

sdesrozis approved these changes Jun 13, 2020

View reviewed changes

sdesrozis and others added 3 commits June 15, 2020 18:49

Merge branch 'master' into update/pascal-voc12

be73958

Removed unused import and fixed version for mlflow

1a97394

Merge branch 'master' into update/pascal-voc12

83d42d1

vfdev-5 merged commit f833124 into pytorch:master Jun 16, 2020

vfdev-5 deleted the update/pascal-voc12 branch June 16, 2020 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pascal voc12 example #1125

Update pascal voc12 example #1125

vfdev-5 commented Jun 12, 2020 •

edited

Loading

sdesrozis left a comment

sdesrozis Jun 13, 2020

vfdev-5 Jun 13, 2020

sdesrozis Jun 13, 2020

vfdev-5 Jun 13, 2020

sdesrozis Jun 13, 2020

Update pascal voc12 example #1125

Update pascal voc12 example #1125

Conversation

vfdev-5 commented Jun 12, 2020 • edited Loading

sdesrozis left a comment

Choose a reason for hiding this comment

sdesrozis Jun 13, 2020

Choose a reason for hiding this comment

vfdev-5 Jun 13, 2020

Choose a reason for hiding this comment

sdesrozis Jun 13, 2020

Choose a reason for hiding this comment

vfdev-5 Jun 13, 2020

Choose a reason for hiding this comment

sdesrozis Jun 13, 2020

Choose a reason for hiding this comment

vfdev-5 commented Jun 12, 2020 •

edited

Loading