Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast-path to format data #731

Merged
merged 6 commits into from
Feb 27, 2023
Merged

Add fast-path to format data #731

merged 6 commits into from
Feb 27, 2023

Conversation

coroa
Copy link
Collaborator

@coroa coroa commented Feb 22, 2023

Co-authored-by: Matthew Gidden [email protected]

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Name of contributors Added to AUTHORS.rst
  • Description in RELEASE_NOTES.md Added

Description of PR

Add a fast-path to format_data for initialization with a multi-index based Series or DataFrame that has all the required columns.

I set the base branch for this PR to PR #730 to highlight the small additional changes necessary.

@codecov
Copy link

codecov bot commented Feb 22, 2023

Codecov Report

Merging #731 (ab6b32e) into main (e07d3b9) will decrease coverage by 0.1%.
The diff coverage is 97.4%.

@@           Coverage Diff           @@
##            main    #731     +/-   ##
=======================================
- Coverage   95.0%   95.0%   -0.1%     
=======================================
  Files         59      59             
  Lines       6020    6037     +17     
=======================================
+ Hits        5725    5741     +16     
- Misses       295     296      +1     
Impacted Files Coverage Δ
pyam/utils.py 92.7% <97.4%> (+<0.1%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

pyam/utils.py Outdated Show resolved Hide resolved
Base automatically changed from split-format-data to main February 24, 2023 12:03
@gidden
Copy link
Member

gidden commented Feb 24, 2023

Hey @coroa - I hope I didn't clobber your commits here by merging your previous PR first. Could you rebase this one on main and then I can provide a review? Thanks!

@gidden gidden marked this pull request as ready for review February 25, 2023 08:38
Copy link
Member

@gidden gidden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @coroa - I might have a small preference on style (will add suggestion), but not blocking.

Is it possible to add a test for the untested code path?

pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! If I understand it correctly, the way to access the fast-pass is to set the index before initialization, right?

@coroa
Copy link
Collaborator Author

coroa commented Feb 27, 2023

Looks good to me! If I understand it correctly, the way to access the fast-pass is to set the index before initialization, right?

Indeed. (or even better keep it as it is, ie never reset it :))

@coroa
Copy link
Collaborator Author

coroa commented Feb 27, 2023

Rebased to new main. Good to merge from my side.

@danielhuppmann
Copy link
Member

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

@coroa
Copy link
Collaborator Author

coroa commented Feb 27, 2023

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

You are spot on. The fast-path is not improving file read-in speed (as-is), but only data passing within pyam and pandas, where the index is preserved, like with the __finalize__ calls you are highlighting.

@coroa coroa merged commit ca5205c into main Feb 27, 2023
@coroa coroa deleted the introduce-fast-path branch February 27, 2023 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants