Skip to content

Fix saving large pipelines #1335

Merged
merged 3 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `mrmr` feature selection working with categoricals ([#1311](https://github.com/tinkoff-ai/etna/pull/1311))
- Fix version of `statsforecast` to 1.4 to avoid dependency conflicts during installation ([#1313](https://github.com/tinkoff-ai/etna/pull/1313))
- Add inverse transformation into `predict` method of pipelines ([#1314](https://github.com/tinkoff-ai/etna/pull/1314))
- Allow saving large pipelines ([#1335](https://github.com/tinkoff-ai/etna/pull/1335))

### Removed
- Building docker images with cuda 10.2 ([#1306](https://github.com/tinkoff-ai/etna/pull/1306))
Expand Down
2 changes: 1 addition & 1 deletion etna/core/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ def _save_metadata(self, archive: zipfile.ZipFile):
output_file.write(metadata_bytes)

def _save_state(self, archive: zipfile.ZipFile):
with archive.open("object.pkl", "w") as output_file:
with archive.open("object.pkl", "w", force_zip64=True) as output_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it affects loading somehow? Do we have test for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't affect loading, we have tests on saving/loading (not large files). Large files I tested manually.
As I understand the problem that with archive.open is that is should know about the size of the file to create a correct metainfo. We provide it with force_zip64 to make sure it creates metainfo for large file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dill.dump(self, output_file)

def save(self, path: pathlib.Path):
Expand Down
4 changes: 2 additions & 2 deletions etna/models/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,7 @@ def get_model(self) -> Any:


class SaveNNMixin(SaveMixin):
"""Implementation of ``AbstractSaveable`` torch related classes.
"""Implementation of ``AbstractSaveable`` torch related classes.

It saves object to the zip archive with 2 files:

Expand All @@ -642,7 +642,7 @@ class SaveNNMixin(SaveMixin):
def _save_state(self, archive: zipfile.ZipFile):
import torch

with archive.open("object.pt", "w") as output_file:
with archive.open("object.pt", "w", force_zip64=True) as output_file:
torch.save(self, output_file, pickle_module=dill)

@classmethod
Expand Down
Loading