-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhance] Add in-memory caching in dataloader #1694
Conversation
vinnamkim
commented
Feb 14, 2023
- Related issue: https://jira.devtools.intel.com/browse/CVS-103077
- Add in-memory caching logic into data loading.
- Update CLI argument parser for this feature.
- Add tests for this feature.
Signed-off-by: Kim, Vinnam <[email protected]>
- Did some refactoring - Add docstrings Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, the features had been frozen for OTX1.0.
But as I see, this could be just an enhancement, so I think we could include this before code freeze. Could you change the tile? [Feature] -> [Enhance]
Sorry for nitpicking.
Finally, could you let us know the impact of this change? How much faster?
Codecov ReportBase: 56.17% // Head: 59.23% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #1694 +/- ##
===========================================
+ Coverage 56.17% 59.23% +3.05%
===========================================
Files 494 487 -7
Lines 34699 35151 +452
===========================================
+ Hits 19493 20821 +1328
+ Misses 15206 14330 -876
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I cannot simply say about the performance gain. It largely depends on the dataset. The main factor would be how many data samples is caching under the memory constraint and the cost saving from avoiding the image decode for each image. However, there is usually trade-off between them. If the decode cost is high, the decoded image size would normally be huge. It will result in not caching even a few decoded images. However, it cannot say that image decode cost is absolutely proportional to image size. Therefore, I expect that this feature is mainly useful for the smaller datasets. This is my experiment for a classification task with Chest Xray dataset.
From this table, you can see that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great implementation, indeed!
Thank you for nice enhancement (feature :)) with thorough documentation update and unit testing. LGTM!
BTW, it's quite huge change, so we need more reviews.
Finally, could you run e2e test locally and post the captured result? There might be some failures, though. (tox -e pre-merge -- tests/e2e
would work, I suppose)
@JihwanEom could you check possible conflict w/ video frame caching? (I suppose not, but need double check) |
Sorry I made a mistake. They had switched their positions in both tables. I fixed it. |
Video data dict for video classification contains "dataset_items" not "dataset_item". Therefore I think we need to add methods for multi-imgs input and output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Anyway
I tried to run 2023-02-15 10:36:39,641 | INFO : train()
2023-02-15 10:36:39,641 | INFO : init data cfg.
2023-02-15 10:36:39,641 | INFO : initializing....
2023-02-15 10:36:39,641 | INFO : called _init_recipe()
2023-02-15 10:36:39,641 | INFO : train type = INCREMENTAL
raise FileNotFoundError(msg_tmpl.format(filename))
FileNotFoundError: file "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/recipes/stages/classification/incremental.yaml" does not exist
Process SpawnProcess-20214:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/hpo/hpo_runner.py", line 155, in _run_train
train_func(hp_config, report_func)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/cli/utils/hpo.py", line 791, in run_trial
trainer.run()
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/cli/utils/hpo.py", line 691, in run
task.train(dataset=dataset, output_model=output_model, train_parameters=score_report_callback)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/api/utils/argument_checks.py", line 234, in validate
return function(**input_parameters_values_map)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/algorithms/classification/tasks/train.py", line 119, in train
results = self._run_task(stage_module, mode="train", dataset=dataset, parameters=train_parameters)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/algorithms/common/tasks/training_base.py", line 132, in _run_task
self._initialize(kwargs)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/algorithms/common/tasks/training_base.py", line 234, in _initialize
self._init_recipe()
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/algorithms/classification/tasks/inference.py", line 396, in _init_recipe
self._recipe_cfg = MPAConfig.fromfile(recipe)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/mpa/utils/config_utils.py", line 119, in fromfile
cfg_dict, cfg_text = MPAConfig._file2dict(filename, use_predefined_variables)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/mpa/utils/config_utils.py", line 28, in _file2dict
check_file_exist(filename)
File "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/mmcv/utils/path.py", line 23, in check_file_exist
raise FileNotFoundError(msg_tmpl.format(filename))
FileNotFoundError: file "/home/vinnamki/otx/training_extensions/.tox/pre-merge/lib/python3.8/site-packages/otx/recipes/stages/classification/incremental.yaml" does not exist
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 24986 is closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for nice work!
I checked that it works well with multi-GPU and HPO.
Signed-off-by: Kim, Vinnam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[DO NOT MERGE YET]
@vinnamkim will fix issues in evaluation, nncf, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Vinnam, but please hold this until having more experimental results.
…ache-feature Signed-off-by: Kim, Vinnam <[email protected]>
- Change parsing destination from args.mem_cache_size to args.params.algo_backend.mem_cache_size Signed-off-by: Kim, Vinnam <[email protected]>
30c9f0d
Signed-off-by: Kim, Vinnam <[email protected]>
…ache-feature Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
…ache-feature Signed-off-by: Kim, Vinnam <[email protected]>
@goodsong81 |
It sounds like feasible. Let's postpone this until 1.0.0 release is done. For the performance, I don't have any doubt about the performance gain through memory caching. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please to resolve some conflicts with the latest develop?
Hi @goodsong81, |
We are planning MPA refactoring (dismantling :p), input / output handling refinement for the next update. There might be some changes, but not that abrupt ones, I suppose. |
We can't be sure, but we don't expect any major changes to the CLI argument for a while, and the same goes for E2E testing. Changes on the configuration side may change the code flow as MPA is decommissioned. However, the config value is unlikely to change. |
Thanks for let me know, @goodsong81 @harimkang. I'll revisit this PR early in the next sprint, @wonjuleee. |
…ache-feature Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature!