Add fast-path to format data #731

coroa · 2023-02-22T11:21:50Z

Co-authored-by: Matthew Gidden [email protected]

Please confirm that this PR has done the following:

Tests Added
Documentation Added
~~Name of contributors Added to AUTHORS.rst~~
~~Description in RELEASE_NOTES.md Added~~

Description of PR

Add a fast-path to format_data for initialization with a multi-index based Series or DataFrame that has all the required columns.

~~I set the base branch for this PR to PR #730 to highlight the small additional changes necessary.~~

codecov · 2023-02-22T11:57:28Z

Codecov Report

Merging #731 (ab6b32e) into main (e07d3b9) will decrease coverage by 0.1%.
The diff coverage is 97.4%.

@@           Coverage Diff           @@
##            main    #731     +/-   ##
=======================================
- Coverage   95.0%   95.0%   -0.1%     
=======================================
  Files         59      59             
  Lines       6020    6037     +17     
=======================================
+ Hits        5725    5741     +16     
- Misses       295     296      +1

Impacted Files	Coverage Δ
pyam/utils.py	`92.7% <97.4%> (+<0.1%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

pyam/utils.py

gidden · 2023-02-24T15:54:15Z

Hey @coroa - I hope I didn't clobber your commits here by merging your previous PR first. Could you rebase this one on main and then I can provide a review? Thanks!

gidden

Thanks for this @coroa - I might have a small preference on style (will add suggestion), but not blocking.

Is it possible to add a test for the untested code path?

pyam/utils.py

danielhuppmann

Looks good to me! If I understand it correctly, the way to access the fast-pass is to set the index before initialization, right?

coroa · 2023-02-27T19:23:06Z

Looks good to me! If I understand it correctly, the way to access the fast-pass is to set the index before initialization, right?

Indeed. (or even better keep it as it is, ie never reset it :))

Co-authored-by: Matthew Gidden <[email protected]>

Black puts it into one line. No haggling. Co-authored-by: Matthew Gidden <[email protected]>

Co-authored-by: Daniel Huppmann <[email protected]>

coroa · 2023-02-27T19:29:29Z

Rebased to new main. Good to merge from my side.

danielhuppmann · 2023-02-27T20:07:11Z

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

coroa · 2023-02-27T23:40:21Z

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

You are spot on. The fast-path is not improving file read-in speed (as-is), but only data passing within pyam and pandas, where the index is preserved, like with the __finalize__ calls you are highlighting.

coroa requested review from danielhuppmann and gidden February 22, 2023 11:23

coroa force-pushed the introduce-fast-path branch from 58e27b0 to 8bd6d4c Compare February 22, 2023 11:35

coroa mentioned this pull request Feb 22, 2023

Next iteration at a fast format_data #727

Closed

4 tasks

coroa mentioned this pull request Feb 22, 2023

Improve performance of format_data() #729

Merged

1 task

coroa force-pushed the split-format-data branch from c841dcb to 3146714 Compare February 22, 2023 12:40

coroa force-pushed the introduce-fast-path branch from 0f0e84c to a719d53 Compare February 22, 2023 12:42

danielhuppmann reviewed Feb 23, 2023

View reviewed changes

pyam/utils.py Outdated Show resolved Hide resolved

Base automatically changed from split-format-data to main February 24, 2023 12:03

coroa force-pushed the introduce-fast-path branch from a719d53 to 27d7c2c Compare February 24, 2023 22:58

gidden marked this pull request as ready for review February 25, 2023 08:38

gidden approved these changes Feb 25, 2023

View reviewed changes

gidden reviewed Feb 25, 2023

View reviewed changes

pyam/utils.py Outdated Show resolved Hide resolved

danielhuppmann reviewed Feb 27, 2023

View reviewed changes

pyam/utils.py Outdated Show resolved Hide resolved

danielhuppmann approved these changes Feb 27, 2023

View reviewed changes

coroa and others added 6 commits February 27, 2023 20:27

Add fast-path to format data

150e90d

Co-authored-by: Matthew Gidden <[email protected]>

Add missing dropna and fix column order

a011b97

Style suggestion

3c19886

Black puts it into one line. No haggling. Co-authored-by: Matthew Gidden <[email protected]>

Apply suggestions

3effa08

Co-authored-by: Daniel Huppmann <[email protected]>

Make choice to examine index levels explicit

4bc9185

Add entry to release notes

ab6b32e

coroa force-pushed the introduce-fast-path branch from 81b3849 to ab6b32e Compare February 27, 2023 19:28

coroa merged commit ca5205c into main Feb 27, 2023

coroa deleted the introduce-fast-path branch February 27, 2023 23:42

This was referenced Aug 3, 2023

Bump pyam-iamc to >= 1.9.0 iiasa/climate-assessment#36

Merged

Helpful accessors confounded by pandas regression #762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast-path to format data #731

Add fast-path to format data #731

coroa commented Feb 22, 2023 •

edited

Loading

codecov bot commented Feb 22, 2023 •

edited

Loading

gidden commented Feb 24, 2023

gidden left a comment

danielhuppmann left a comment

coroa commented Feb 27, 2023

coroa commented Feb 27, 2023 •

edited

Loading

danielhuppmann commented Feb 27, 2023

coroa commented Feb 27, 2023

Add fast-path to format data #731

Add fast-path to format data #731

Conversation

coroa commented Feb 22, 2023 • edited Loading

Please confirm that this PR has done the following:

Description of PR

codecov bot commented Feb 22, 2023 • edited Loading

Codecov Report

gidden commented Feb 24, 2023

gidden left a comment

Choose a reason for hiding this comment

danielhuppmann left a comment

Choose a reason for hiding this comment

coroa commented Feb 27, 2023

coroa commented Feb 27, 2023 • edited Loading

danielhuppmann commented Feb 27, 2023

coroa commented Feb 27, 2023

coroa commented Feb 22, 2023 •

edited

Loading

codecov bot commented Feb 22, 2023 •

edited

Loading

coroa commented Feb 27, 2023 •

edited

Loading