Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass through Engine kwargs in ExcelWriter #43445

Merged
merged 17 commits into from
Nov 25, 2021

Conversation

feefladder
Copy link
Contributor

@feefladder feefladder commented Sep 7, 2021

This is the revised version of #42214. In fact, in xlsxwriter, all kwargs are passed like so:

ExcelWriter("path_to_file.xlsx",engine="xlsxwriter",engine_kwargs={"options":{"nan_inf_to_errors":True}})

which then becomes internally:

from xlsxwriter import Workbook
Workbook("path_to_file.xlsx",options={"nan_inf_to_errors":True})

so my initial fix was wrong in that sense, because I made engine_kwargs able to be passed in like this:

ExcelWriter("path_to_file.xlsx",engine="xlsxwriter",engine_kwargs={"nan_inf_to_errors":True})

and the resulted internals:

Workbook("pat_to_file.xlsx",{"nan_inf_to_errors":True})

so #42214 was actually wrong (we want users to actually pass in kwargs.

However, there was still the issue of #43442, where the following engines do have some options that were silently ignored:

  • openpyxl can set iso_dates and write_only
  • xlwt can set style_compression (not sure why anybody would want this, but people may)
  • xlsxwriter has many options

so I added tests for that in engine_kwargs.

@feefladder feefladder changed the title Engine kwargs Pass through Engine kwargs in ExcelWriter Sep 7, 2021
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_openpyxl.py Show resolved Hide resolved
pandas/tests/io/excel/test_xlwt.py Outdated Show resolved Hide resolved
@lithomas1 lithomas1 added the IO Excel read_excel, to_excel label Sep 9, 2021
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor request below. Also some CI failures related to the docstrings.

pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
@feefladder
Copy link
Contributor Author

feefladder commented Sep 20, 2021

I had 6 times in python scripts/validate_docstrings.py pandas.ExcelWriter that ExcelWriter is not defined So I changed those to pd.ExcelWriter.
Then that

>>> with pd.ExcelWriter(
    ...     "path_to_file.xlsx",
    ...     engine="openpyxl",
    ...     mode="a",
    ...     engine_kwargs={"keep_vba": True}
    ... ) as writer:
    ...     df.to_excel(writer)

had that Sheet1 already exists, and if_sheet_exists is set to "error". I added sheet_name="Sheet2
so that is changed, but I think this is a bit weird, and does not add to simplicity of the example.
and twice that I forgot a space after colon
Now only 'error' is that there is no "See Also" section, but I think that should pass?

@rhshadrach
Copy link
Member

Thanks @joeperdefloep and apologies for the delay. Docs look good (addition of pd. is on master now as well). Looks like there is one failure that is related:

FAILED pandas/tests/io/excel/test_openpyxl.py::test_engine_kwargs_append[.xlsx]

@jreback jreback added this to the 1.4 milestone Sep 29, 2021
@jreback
Copy link
Contributor

jreback commented Oct 3, 2021

@joeperdefloep can you merge master and fixup according to @rhshadrach comments

@alimcmaster1 alimcmaster1 self-assigned this Oct 18, 2021
@erfannariman
Copy link
Member

erfannariman commented Oct 28, 2021

@joeperdefloep @alimcmaster1 I am willing to finish this one up, if you guys don't have the time, since we need this in some of our projects.

@feefladder
Copy link
Contributor Author

Thanks @joeperdefloep and apologies for the delay. Docs look good (addition of pd. is on master now as well). Looks like there is one failure that is related:

FAILED pandas/tests/io/excel/test_openpyxl.py::test_engine_kwargs_append[.xlsx]

Hello everyone. Apologies for the delays

That's funny... It passed on my machine.... This is the error:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\wrE4sPxHRxGbeUSKoQgfnJd1mC0iTd.xlsx'

I just changed the tests. Should be good now. ARM seems to be failing, but that is quite the black box to me...

@feefladder
Copy link
Contributor Author

I now get a lot of the following errors:

AttributeError: module 'aiobotocore' has no attribute 'AioSession'

seems to be related to this issue:
fsspec/s3fs#514

pandas/io/excel/_base.py Outdated Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, some minor requests on the tests.

Comment on lines 112 to 119
@pytest.mark.parametrize(
"engine_kwargs", [{"data_only": True}, {"keep_vba": True}, {"keep_links": False}]
)
def test_engine_kwargs_append(ext, engine_kwargs):
# GH 43445
# only read_only=True will give something easy that can be verified (maybe
# keep_links or data_only could also work, but that'd be more complicated)
# arguments are passed through to load_workbook()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read_only would be sufficient. Need to verify the kwarg gets through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought testing for all available parameters would be nice, since then we will also be warned when API changes on openpyxl's side...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, read-only will not work here, and give different errors on windows than linux, this is why the previous test failed

Copy link
Member

@rhshadrach rhshadrach Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the test correctly, the passing of kwargs through to openpyxl could be entirely removed from pandas and this test would pass. Let me know if you're not sure of a way to improve that (or if my assessment is wrong).

Copy link
Contributor Author

@feefladder feefladder Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's true. That's why I have the other test: test_engine_kwargs_append_invalid that tests for an invalid keyword argument in load_workbook. Also the other test test_engine_kwargs_append_data_only tests for correct use of openpyxl's data_only being passed through. That would make this test redundant indeed? It just felt logical to first test if passing an engine kwarg does not raise an error normally, then test if it does raise with an invalid argument and then test if the passing of engine kwargs works well together with openpyxl. Then in that order, people will know more precisely where the error lies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me if you want it removed!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the summary, very helpful. Yes, I think this test can be removed as it's redundant with test_engine_kwargs_append_data_only. In general, we'd rather not have to change tests if the API of a third party engine changes and it doesn't negatively impact pandas. It is okay to do so when necessary, as is in the data_only test.

pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
@feefladder
Copy link
Contributor Author

@jreback friendly ping :)

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, can you resolve conflict.

@jreback jreback merged commit fd95026 into pandas-dev:master Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

thanks @joeperdefloep very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
6 participants