Add xlsxwriter to improve to_excel performance #701

phackstock · 2022-09-12T11:30:28Z

Please confirm that this PR has done the following:

~~Tests Added (none needed except for possibly benchmarks)~~
~~Documentation Added~~
Name of contributors Added to AUTHORS.rst
Description in RELEASE_NOTES.md Added

Description of PR

Added xlsxwriter to the dependencies of pyam. pandas uses xlsxwriter over openpyxl if it's found. This means that if xlsxwriter is found on the system it is used without any changes required.
According to benchmarks (https://exchangetuts.com/python-fastest-way-to-write-pandas-dataframe-to-excel-on-multiple-sheets-1640154784194443), xlsxwriter is significantly faster than openpyxl.
Should I set up some benchmarks of our own to test it for pyam?

codecov · 2022-09-12T11:40:08Z

Codecov Report

Merging #701 (a8c77af) into main (759120f) will increase coverage by 0.0%.
The diff coverage is 60.0%.

@@          Coverage Diff          @@
##            main    #701   +/-   ##
=====================================
  Coverage   94.8%   94.9%           
=====================================
  Files         58      58           
  Lines       5856    5853    -3     
=====================================
- Hits        5557    5555    -2     
+ Misses       299     298    -1

Impacted Files	Coverage Δ
pyam/core.py	`94.7% <33.3%> (-0.2%)`	⬇️
pyam/utils.py	`91.7% <100.0%> (+0.5%)`	⬆️
tests/test_io.py	`100.0% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

danielhuppmann · 2022-09-13T07:27:28Z

I just pip-installed pyam in a clean environment and xlsxwriter was installed automatically already. Not sure if it's necessary to add it as an explicit dependency?

phackstock · 2022-09-13T07:38:23Z

Interesting, it's not a dependency of pandas (https://github.com/pandas-dev/pandas/blob/main/setup.cfg#L33).
Maybe it was installed as a dependency by some other library.
I just tried the same, install pyam from a clean virtual environment and it did not install it for me. Maybe it's platform dependent.

danielhuppmann · 2022-09-13T08:01:50Z

You're right, looks like I was confused between two environments. Running some profiling now.

danielhuppmann

Note that openpyxl is hard-coded in line

pyam/pyam/core.py

Line 195 in 759120f

excel_file = pd.ExcelFile(data, engine="openpyxl")

Also, I guess it's possible to remove openpyxl as a dependency?

I just ran some tests and can confirm that

removing the hard-coded engine works whether or not xlsxwriter is installed
performance is significantly better if xlsxwriter is installed, no need to specify it a engine explicitly (on a 30MB xlsx file, the resulting file is smaller by 50%, time is -45%, CPU is -80%)

RELEASE_NOTES.md

Co-authored-by: Daniel Huppmann <[email protected]>

danielhuppmann · 2022-09-13T15:59:43Z

There is another usage of pd.ExcelWriter where the engine is not explicitly specified. Please make the two usages consistent.

pyam/pyam/core.py

Line 2377 in 759120f

excel_writer, close = pd.ExcelWriter(excel_writer), True

pyam/core.py

Co-authored-by: Daniel Huppmann <[email protected]>

phackstock · 2022-09-14T09:12:19Z

Updated the usage of xlsxwriter in pd.ExcelWriter.
There's one more thing that I found:

https://github.com/IAMconsortium/pyam/blob/main/pyam/utils.py#L135

is there a test for utils.write_sheet?

danielhuppmann · 2022-09-14T09:15:57Z

is there a test for utils.write_sheet?

It's tested implicitly via

pyam/tests/test_io.py

Line 56 in 759120f

def test_io_xlsx(test_df, meta_args, tmpdir):

The line you found was a hacky attempt by me to make xlsx files look "nice" by having useful column widths.

phackstock · 2022-09-14T09:17:13Z

Ok, should I check if using xlsxwriter triggered the error?

danielhuppmann · 2022-09-14T09:18:22Z

Let me give this a quick try, I think you have more urgent things on your to-do list...

phackstock · 2022-09-14T09:19:45Z

True, just thought this might be a quick one to get off the list ...

Feature/add xlsxwriter width

Add xlsxwriter as dependency

017171d

phackstock requested a review from danielhuppmann September 12, 2022 11:30

Update RELEASE_NOTES

6fa970b

danielhuppmann reviewed Sep 13, 2022

View reviewed changes

RELEASE_NOTES.md Outdated Show resolved Hide resolved

phackstock and others added 2 commits September 13, 2022 14:13

Apply suggestions from code review

1be680e

Co-authored-by: Daniel Huppmann <[email protected]>

Change write excel engine to xlsxwriter

68abc88

Make xlsxwriter usage explicit

0f09847

danielhuppmann reviewed Sep 14, 2022

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

danielhuppmann assigned phackstock Sep 14, 2022

Update pyam/core.py

0db5751

Co-authored-by: Daniel Huppmann <[email protected]>

danielhuppmann and others added 3 commits September 14, 2022 11:28

Use xlsxwriter width specification

715c65d

Add dependency change to the release notes

d30b932

Merge pull request #1 from danielhuppmann/feature/add-xlsxwriter-width

a8c77af

Feature/add xlsxwriter width

phackstock requested a review from danielhuppmann September 14, 2022 15:16

danielhuppmann approved these changes Sep 14, 2022

View reviewed changes

danielhuppmann merged commit fde3690 into IAMconsortium:main Sep 14, 2022

phackstock deleted the feature/add-xlsxwriter branch September 14, 2022 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xlsxwriter to improve to_excel performance #701

Add xlsxwriter to improve to_excel performance #701

phackstock commented Sep 12, 2022 •

edited

Loading

codecov bot commented Sep 12, 2022 •

edited

Loading

danielhuppmann commented Sep 13, 2022

phackstock commented Sep 13, 2022

danielhuppmann commented Sep 13, 2022

danielhuppmann left a comment

danielhuppmann commented Sep 13, 2022

phackstock commented Sep 14, 2022

danielhuppmann commented Sep 14, 2022

phackstock commented Sep 14, 2022

danielhuppmann commented Sep 14, 2022

phackstock commented Sep 14, 2022

Add xlsxwriter to improve to_excel performance #701

Add xlsxwriter to improve to_excel performance #701

Conversation

phackstock commented Sep 12, 2022 • edited Loading

Please confirm that this PR has done the following:

Description of PR

codecov bot commented Sep 12, 2022 • edited Loading

Codecov Report

danielhuppmann commented Sep 13, 2022

phackstock commented Sep 13, 2022

danielhuppmann commented Sep 13, 2022

danielhuppmann left a comment

Choose a reason for hiding this comment

danielhuppmann commented Sep 13, 2022

phackstock commented Sep 14, 2022

danielhuppmann commented Sep 14, 2022

phackstock commented Sep 14, 2022

danielhuppmann commented Sep 14, 2022

phackstock commented Sep 14, 2022

phackstock commented Sep 12, 2022 •

edited

Loading

codecov bot commented Sep 12, 2022 •

edited

Loading