Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix extracting license information for pypi packages #518

Merged
merged 2 commits into from
Apr 13, 2023

Conversation

qtomlinson
Copy link
Collaborator

No description provided.

There were issues in parsing LGPL license in spdx-correct. Previously
parsing LGPL was fixed once in "Fixed handling of GNU LGPL licenses in
spdx-correct". The fix depended on the file in /patches directory to
patch spdx-correct.  The /patches directory was not included in the
Dockerfile and hence the previous LGPL fix was not effective in the
docker deployment.

There was a recent release of spdx-correct.  The LGPL issues that the
patch intended to fix seem resolved. Upgrade spdx-correct to the most
recent version. LGPLv2 and LGPLv2+ are still not correctly identified.
Added patch for the specific cases.

Also update Dockerfile so that the patch will be effective in the container
deployment.

Test cases:
        "url": "cd:/pypi/pypi/-/pycountry/22.3.5"
        "url": "cd:/pypi/pypi/-/chardet/5.1.0"
        "url": "cd:/pypi/pypi/-/PyGObject/3.42.0"
In addition to info.classifier entries in the registry data used to
extract license information, there is also info.license in the registry data.
This can also provide license information when there is no license
information in the classifiers.

Tese cases:
pypi/pypi/-/dnspython/1.11.0
pypi/pypi/-/pytorch-ignite/0.5.0.dev20220727
pypi/pypi/-/mitmproxy-wireguard/0.1.10
@qtomlinson qtomlinson marked this pull request as ready for review April 11, 2023 23:12
@qtomlinson
Copy link
Collaborator Author

@mpcen ready for review

Copy link
Contributor

@jeffwilcox jeffwilcox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will share this with Manny to take a look as well.

@mpcen mpcen merged commit a28303b into clearlydefined:master Apr 13, 2023
qtomlinson pushed a commit to qtomlinson/crawler that referenced this pull request Feb 6, 2024
Fix extracting license information for pypi packages
@qtomlinson qtomlinson deleted the qt/fix_lgpl branch February 6, 2024 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants