[python] remove unnecessary files to reduce sdist size #3579
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Short Description
This PR removes unnecessary files from the tarball created by
python setup.py sdist
, to make that source distribution smaller.Long Description
This week I've been revisiting a conference talk I gave in April, where I showed how to deploy a LightGBM model on AWS Lambda, using a Python Lambda.
To make external dependencies like
lightgbm
available to a Python Lambda, you have to create something called a "Lambda Layer", which is basically like a volume with files on it that get mounted into the filesystem used by your code. The sum of all "layers" that you add cannot exceed 250 MB uncompressed. This made it hard to set up, for example, a Lambda that usedlightgbm
,pandas
, andscikit-learn
. I had to do some surgery on the packages' source distributions to get under the limit: https://github.com/jameslamb/talks/blob/main/cloud-intro/scripts/create-layers.shTo help people using Lambdas or any other settings that are very sensitive to code footprint, this PR proposes some changes to remove unnecessary files from the source distribution of the Python package.
If you look in the
sdist
logs, there are a bunch of files from thecompute
submodule that don't need to be included inlightgbm
, like that submodule's tests and documentation.cd python-package python setup.py sdist
This pull request proposes changes to exclude them. I tested on my Mac and found the following changes in the package size.
master
It makes sense that the wheel didn't change sizes, since we don't include
compute
in it.script I used to check sizes (click me)
Notes for Reviewers
If you agree with the spirit of this PR, we should also be careful to exclude unnecessary files from the sub-modules introduced in #3405 , but that PR already has a lot of comments so I'd rather do that as a follow-up after #3405 is merged.
For reference, the
compute
submodule comes from https://github.com/boostorg/compute.