Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION: Rationale for classifiers taking precedence over license info #523

Open
elrayle opened this issue Sep 13, 2023 · 5 comments
Open

Comments

@elrayle
Copy link
Collaborator

elrayle commented Sep 13, 2023

Description

It appears that for pypi, classifiers take precedence over the license field when extracting the license information. It is clear from the code how this is happening. I'm wondering about the rationale for this approach. Also if it is determined that one is correct and the other is not, is there a process for updating the license or the classifier as needed?

Test

  it('parses the correct license information from classifiers in registry data', () => {
    const registryData = JSON.parse(fs.readFileSync('test/fixtures/pypi/registryData_lgpl2.json'))
    const declared = fetch._extractDeclaredLicense(registryData)
    expect(declared).to.be.equal('LGPL-2.0-only')
  })

Fixture Data

classifiers

    "classifiers": [
      ...
      "License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)",
      ...
    ],

license

    "license": "LGPL 2.1",

Expected

With a specific license given in info.license, I would expect the license to be either LGPL-2.0-or-later or LGPL-2.1-only.

Actual

Precedence is given to classifiers in function _extractDeclaredLicense, which produces license LGPL-2.0-only.

  _extractDeclaredLicense(registryData) {
    const licenseFromClassifiers = this._extractLicenseFromClassifiers(registryData)
    if (licenseFromClassifiers) return licenseFromClassifiers
    const license = get(registryData, 'info.license')
    return license && spdxCorrect(license)
  }
@qtomlinson
Copy link
Collaborator

qtomlinson commented Nov 2, 2023

@elrayle This will be a good topic to discuss in our next community meeting. Historically, license was only parsed from classifiers. This commit adds the functionality to extract license from info.license. To avoid any breaking changes, extracting from info.license is added as a fallback if no valid license is parsed from classifiers.

@qtomlinson
Copy link
Collaborator

Another case: https://clearlydefined.io/definitions/pypi/pypi/-/UpSetPlot/0.9.0.

  • from Classifier, "BSD License" is converted to 'BSD-2-Clause' by spdxCorrect,
  • "license": "BSD-3-Clause"
  • ScanCode also detected BSD-3-Clause.

@qtomlinson
Copy link
Collaborator

qtomlinson commented Jun 28, 2024

Reading through Pypi meta data description, license field seems to be more specific. Shall we give precedence to the license field over classifier in the meta data? @ariel11 @capfei @jeffwilcox @bduranc @Jeffrey-Luszcz @@sgustafsson Any thoughts?

@qtomlinson
Copy link
Collaborator

@capfei Thanks for your feedback! Our email discussion quoted below.

I agree with using license over classifers, since the license usually provides a clear license identifier. The classifer list is limited and can be a confusing combination sometimes.

Example of confusing classifers but clear license info: clearlydefined/curated-data#27902 (comment)

Thank you,

Candice

@qtomlinson
Copy link
Collaborator

During testing my PR, I found some cases that classifier has more detailed information:

I have updated the precedence to prioritize the license field, unless the classifier has the version and the license field does not. I'm excited to hear your thoughts. Additionally, I have included 10 new test cases in my PR. Hopefully, this will help reduce the number of cases for curation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants