Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Remove docstrings from inline cython code #46934

Closed

Conversation

wlach
Copy link
Contributor

@wlach wlach commented May 3, 2022

These functions aren't externally available, and removing them saves
a small amount of space (approximately 250k in the size of the generated
.so's on a linux64 machine).

These functions aren't externally available, and removing them saves
a small amount of space (approximately 250k in the size of the generated
.so's on a linux64 machine).
@wlach
Copy link
Contributor Author

wlach commented May 3, 2022

More details:

base sizes: https://gist.github.com/wlach/d1a70c3246e51137e71d6f992ff34149
"no docstrings" sizes: https://gist.github.com/wlach/ac2151d94ba1c895f530ee1585c545c9

Calculating savings:

>>> import pandas
>>> df1 = pandas.read_csv('cython-sizes-base.csv')
>>> df1['bytes'].sum()
90027048
>>> df2 = pandas.read_csv('cython-sizes-no-docstrings.csv')
>>> df2['bytes'].sum()
89744976

I am especially interested in this for the case of pyodide, where the load times of pandas is a very large bottleneck: pyodide/pyodide#347

I don't expect this to help that much, but any savings (which doesn't affect end user functionality) is welcome.

wlach added a commit to wlach/pyodide that referenced this pull request May 3, 2022
Local patch of pandas-dev/pandas#46934

I don't expect this to make a user-noticeable improvement, but it seems
worth doing regardless. Locally I see a reduction from 4546988 for the
whole wheel to 4545234 (so just under 1k) although some of the .so's
inside are a few kB smaller.
@rth
Copy link
Contributor

rth commented May 3, 2022

Thanks for the PR! I'm not sure they are really private, because this does break some of the tests checking docstrings.

@jbrockmendel
Copy link
Member

I don't expect this to help that much, but any savings (which doesn't affect end user functionality) is welcome.

If you can get this passing the CI, this sounds good to me.


_CYTHON_INSTALLED = parse_version(_CYTHON_VERSION) >= parse_version(min_cython_ver)
Options.docstrings = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment pointing back to this PR

@wlach
Copy link
Contributor Author

wlach commented May 4, 2022

Thanks for the PR! I'm not sure they are really private, because this does break some of the tests checking docstrings.

Yes, I think you're right on closer inspection (I had a hard time understanding the test runs) -- and it's quite likely it's the larger docstrings which are the ones that are really needed. This is probably not worth pursuing, but I'll leave this open for a bit in case I'm missing something obvious.

@lithomas1
Copy link
Member

Perhaps you could try making this a compile time option that is disabled by default?

@wlach
Copy link
Contributor Author

wlach commented May 5, 2022

Perhaps you could try making this a compile time option that is disabled by default?

Not my call, but I'd personally be a bit reluctant to do this -- the savings aren't IMO great enough to justify the increased complexity.

@mroeschke
Copy link
Member

As an aside, if you're looking to implement space savings, there has been consensus to remove tests from the distribution: #30741

@rth
Copy link
Contributor

rth commented May 5, 2022

We are already unvendoring pandas tests as a separate package in Pyodide, so that is not an issue.

@jreback jreback added the Docs label May 8, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 8, 2022
@datapythonista
Copy link
Member

Looks like we're not moving forward with this, and the discussion is stale, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants