Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to remove data from autocausality object after fit #251

Closed
wants to merge 1 commit into from

Conversation

julianteichgraber
Copy link
Collaborator

@julianteichgraber julianteichgraber commented Apr 12, 2023

Problem

There are memory issues with autocausality since the fitted estimator automatically stores the data after fitting. DoWhy or Scikit-Learn estimators don't do that and there is no immediate need for it.

Proposed changes

  • Add option to remove data after fitting in fit method
  • also removed initialisation of autocausality with data since it is never used. Suffices and would be consistent with DoWhy / Scikit-Learn to only insert data through fit method.

In the future, will consider moving train-test-split to CausalityDataset to always keep test_df available.

Types of changes

What types of changes does your code introduce to Auto-Causality?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@julianteichgraber julianteichgraber linked an issue Apr 12, 2023 that may be closed by this pull request
@EgorKraevTransferwise
Copy link
Collaborator

Maybe worth adding a method to delete the data after the fit, for the case when you want to fit, further play with the data, then pickle the fitted model but without the data?

@AlxdrPolyakov AlxdrPolyakov deleted the memory-load-reduce branch September 5, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop saving copies of the dataset in fitted estimators
3 participants