Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor initialization for simpler maintenance #730

Merged
merged 9 commits into from
Feb 24, 2023
Merged

Conversation

coroa
Copy link
Collaborator

@coroa coroa commented Feb 22, 2023

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Name of contributors Added to AUTHORS.rst
  • Description in RELEASE_NOTES.md Added

Description of PR

On-top of #729. Rebased to main.

Splits utils.format_data into 6 different functions:

  1. _convert_r_columns(df) - Check and convert R-style year columns
  2. _knead_data(df, **kwargs) - Replace, rename and concat according to user arguments
  3. _format_from_database(df) - Post-process database results
  4. _intuit_column_groups(df, index) - Check and categorise columns in dataframe
  5. _format_data_to_series(df, index) - Convert a long or wide pandas dataframe to a series with the required columns
  6. _validate_complete_index(df)

Functionally it should be neutral. Make sure to look at individual commits to follow the refactoring trail.

@codecov
Copy link

codecov bot commented Feb 22, 2023

Codecov Report

Merging #730 (58eb241) into main (7a97516) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #730   +/-   ##
=====================================
  Coverage   95.0%   95.0%           
=====================================
  Files         59      59           
  Lines       6014    6020    +6     
=====================================
+ Hits        5717    5725    +8     
+ Misses       297     295    -2     
Impacted Files Coverage Δ
pyam/core.py 95.4% <100.0%> (ø)
pyam/utils.py 92.6% <100.0%> (+0.7%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@gidden
Copy link
Member

gidden commented Feb 22, 2023

This LGTM but best if @danielhuppmann approves!

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few suggestions inline.

Also, maybe move _validate_complete_index(df) above format_data() to have all components defined (in the order as they are called) before the actual function.

pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
df = df.stack(dropna=True)
df.name = "value"
df.index.names = df.index.names[:-1] + [time_col]
df, time_col, extra_cols = _format_data_to_series(df, index)

# cast value column to numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also put these two checks into their own function _check_data_integrity()...

pyam/utils.py Outdated Show resolved Hide resolved
pyam/utils.py Outdated Show resolved Hide resolved
@gidden
Copy link
Member

gidden commented Feb 24, 2023

LGTM - thanks @coroa ! Will merge after tests pass.

@gidden gidden merged commit 128fb19 into main Feb 24, 2023
@gidden gidden deleted the split-format-data branch February 24, 2023 12:03
@gidden gidden restored the split-format-data branch February 24, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants