Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Add SPDX-License-Identifier, and use LICENSES/ directory layout in collections and ansible-core? #112

Closed
felixfontein opened this issue Jun 17, 2022 · 32 comments

Comments

@felixfontein
Copy link
Contributor

Summary

ansible-core (and quite a few collections) use simplified license headers like

# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
# Simplified BSD License (see licenses/simplified_bsd.txt or https://opensource.org/licenses/BSD-2-Clause)

and ansible-core has their main license file (GPLv3+) in COPYING, and all other licenses moved into a directory called licenses/.

While working on similar unifications (ansible-collections/community.crypto#478, ansible-collections/community.general#4847, ansible-collections/community.docker#386), @gotmax23 suggested in ansible-collections/community.crypto#478 (comment) to instead use the more standard SPDX-License-Identifier identifier for specifying licenses in files, and use a LICENSES/ directory with license files named after the SPDX identifiers in there - these are parts of the REUSE standard (https://reuse.software/). The license headers would look like this:

# SPDX-License-Identifier: GPL-3.0-or-later
# SPDX-License-Identifier: BSD-2-Clause

I personally prefer the more human-friendly simplified license headers that ansible-core and a lot of collections already use, but I think the SPDX-License-Identifier is also really helpful to allow automatic extraction of this licensing information with existing tools. So what do you think about combining these as follows:

# Copyright: (c) 2022, ...
# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
# SPDX-License-Identifier: GPL-3.0-or-later

or

# Copyright: (c) 2022, ...
# Simplified BSD License (see licenses/simplified_bsd.txt or https://opensource.org/licenses/BSD-2-Clause)
# SPDX-License-Identifier: BSD-2-Clause

?

About LICENSES/: how about renaming licenses/ to LICENSES/ (resp. creating it in the first place, most collections have all their license files in the collection root) and switching to SPDX identifiers as filenames? I don't have a personal preference for the ansible-core or the REUSE style, but since the latter is standardized I think it's better to use it.

Two questions with regard to that are: a) should the main license file (often COPYING) be moved to LICENSES/GPL-3.0-or-later.txt as well, or stay as COPYING in the main directory? b) Should galaxy.yml list all license files, or just refer to the main license (GPLv3+)?

So what do you think, both about SPDX-License-Identifier and the LICENSES/ directory?

(For historical notes: ansible-core has been using licenses/ and the short BSD header since 2.5.0: ansible/ansible#31913 The very first use of the short license header I found in ansible/ansible#26812, which ended up in 2.4.0.0.)

@gotmax23
Copy link
Contributor

Thank you @felixfontein for starting this discussion. I think having consistent, machine-readable license information would make it easier to ensure that collections included in ansible are following our licensing requirements. This would also make it easier for me and other distro package maintainers who package ansible and/or the standalone collections to follow our respective distribution's licensing guidelines. Fully adopting the REUSE standard and using their linting tool would automatically ensure that we aren't missing any license files, which has been a problem for some collections. These are not the only benefits to his approach.

About LICENSES/: how about renaming licenses/ to LICENSES/ (resp. creating it in the first place, most collections have all their license files in the collection root) and switching to SPDX identifiers as filenames? I don't have a personal preference for the ansible-core or the REUSE style, but since the latter is standardized I think it's better to use it.

+1.

I personally prefer the more human-friendly simplified license headers that ansible-core and a lot of collections already use, but I think the SPDX-License-Identifier is also really helpful to allow automatic extraction of this licensing information with existing tools. So what do you think about combining these as follows:

I think it's better and less confusing to use one type of license notation, but at least REUSE doesn't forbid keeping existing license headers.

a) should the main license file (often COPYING) be moved to LICENSES/GPL-3.0-or-later.txt as well, or stay as COPYING in the main directory?

I would recommend the former, but I don't have a strong opinion here. It doesn't necessarily need to be one or the other. Note that Github will not detect the license of the repository if there's no COPYING, LICENSE, or similarly named file in the repository root.

b) Should galaxy.yml list all license files, or just refer to the main license (GPLv3+)?

I think both the README and galaxy.yml should include all licenses that apply to the codebase. Even though the collections and ansible bundle are distributed and primarily licensed as GPL-3.0-or-later, the other licenses that may apply to the codebase still apply and need to be followed. For permissive licenses, that doesn't usually entail much more than preserving the license text and copyrights, but that doesn't mean that the GPL erases them.

@felixfontein
Copy link
Contributor Author

CC @sivel / @nitzmahone. We are wondering what a) you think about this, b) whether you remember why the current license header / licenses/ directory looks as it does (including COPYING being outside of that directory), and c) whether you would be interested in also using the SPDX-License-Identifier and/or other things.

@briantist
Copy link

I think that there's no downside to adding a machine-readable header for discovering license information, so we should probably encourage or at least allow it. I'd be fine with combining it with the more human-friendly one as well.

Generally, I like the idea of moving licenses to a dedicated directory, but the GitHub support relying on a file in the root can't be overlooked, so I don't think I'd want to do anything that stops that from working.

I think both the README and galaxy.yml should include all licenses that apply to the codebase.

Agree.

@briantist
Copy link

Oh and as a practical matter I wanted to call out the relatively new availability of git-blame-ignore-revs support on GitHub:
https://docs.github.com/en/repositories/working-with-files/using-files/viewing-a-file#ignore-commits-in-the-blame-view

Change the headers in nearly every file in a repository is probably a good use case for this.

@mariolenz
Copy link
Contributor

Generally, I like the idea of moving licenses to a dedicated directory, but the GitHub support relying on a file in the root can't be overlooked, so I don't think I'd want to do anything that stops that from working.

How about having all licenses in licenses/ and make COPYING a link to the main license, for example COPYING -> licenses/GPL-3.0-license.txt? Would this be OK for GitHub?

@gotmax23
Copy link
Contributor

gotmax23 commented Jun 23, 2022

How about having all licenses in licenses/ and make COPYING a link to the main license, for example COPYING -> licenses/GPL-3.0-license.txt? Would this be OK for GitHub?

COPYING -> LICENSES/GPL-3.0-license.txt doesn't work, but it works the other way around (i.e. LICENSES/GPL-3.0-license.txt -> COPYING ). See https://github.com/gotmax23/community.docker as an example.

@gotmax23
Copy link
Contributor

I put together a draft PR against community.docker to show folks what this looks like. For reference, I've also included the commands I used to automate the process.

@felixfontein
Copy link
Contributor Author

I would keep the # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt) comments (instead of removing them). This a) makes the sanity tests pass again, and b) I find that comment nicer to read as a human than # SPDX-License-Identifier: GPL-3.0-or-later. Having both comments would be perfect IMO.

@gotmax23
Copy link
Contributor

gotmax23 commented Jun 23, 2022

I would keep the # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt) comments (instead of removing them).

Yeah, I know that part was still in the air. Unfortunately, reuse addheader removes those and doesn't provide an easy way to configure it not to do that. I will look into it. However, I do think the sanity tests should be fixed. I don't think it's very sane ;D for ansible-test to require one specific license header format.

@felixfontein
Copy link
Contributor Author

The sanity check looks for GNU General Public License and at least one of version 3 and v3.0 in the first 20 lines (https://github.com/ansible/ansible/blob/devel/test/lib/ansible_test/_util/controller/sanity/validate-modules/validate_modules/main.py#L518-L525).

Also the developer documenation specifies how the license header should look: https://docs.ansible.com/ansible-core/devel/dev_guide/developing_modules_documenting.html#copyright

@felixfontein
Copy link
Contributor Author

Ok, maybe it's easier to discuss a very concrete proposal. So let me propose the following:

  1. Always add the SPDX-License-Identifier: directly after the existing license identifier, as in:
    # Copyright: (c) 2022, ...
    # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
    # SPDX-License-Identifier: GPL-3.0-or-later
    
  2. Create a directory LICENSES/ with all licenses from the project in it, named according to their SPDX license identifiers.
  3. The main license (usually GPLv3) should be kept (also) as LICENSE or COPYING in the root directory (so that GitHub can easily identify and correctly show the license), and the corresponding license file in LICENSES/ should be made a relative symlink to that file.

Reasoning:

  • This proposal allows to use automatic tooling (REUSE) to scan for licenses in the collection (https://reuse.software/spec/).
  • This proposal is compatible with the GPLv3+ license check in the existing ansible-test validate-modules sanity check, and thus is also nice to read for humans.

@felixfontein
Copy link
Contributor Author

@ansible-community/steering-committee what do you think of the above proposal?

@gotmax23
Copy link
Contributor

gotmax23 commented Jun 25, 2022

  1. Always add the SPDX-License-Identifier: directly after the existing license identifier, as in:

Reuse's automated tooling (reuse addheader) does not support this. It always adds the header to the top of the file, under the shebang (if there is one). Unless we want to write a new script to add these headers automatically (I, for one, don't really want to), it might be better to just keep it where reuse addheader puts it. If we don't have some automated way to handle this, it will be very tedious to add the headers to existing collections.

Besides this, I obviously support this proposal :).

@gotmax23
Copy link
Contributor

gotmax23 commented Jun 25, 2022

This proposal allows to use automatic tooling (REUSE) to scan for licenses in the collection (https://reuse.software/spec/).

FTR, reuse-tool is not the only tool that can detect # SPDX-License-Identifier comments. Because this format is standardized, most license scanning tools already support it.

@felixfontein
Copy link
Contributor Author

felixfontein commented Jun 26, 2022

  1. Always add the SPDX-License-Identifier: directly after the existing license identifier, as in:

Reuse's automated tooling (reuse addheader) does not support this. It always adds the header to the top of the file, under the shebang (if there is one). Unless we want to write a new script to add these headers automatically (I, for one, don't really want to), it might be better to just keep it where reuse addheader puts it. If we don't have some automated way to handle this, it will be very tedious to add the headers to existing collections.

I hacked a little script together (https://gist.github.com/felixfontein/3d9502e22d1461dc3baadddeeb9d9cd9) that does this. It definitely does need someone to look over the results; I've tested it with community.general, community.crypto, community.docker, and ansible-core, and so far there was one file in community.docker and two files in ansible-core that needed manual changes. (This was mainly dual-licensed vendored content, or files that had content covered by multiple licenses.)

(It also updates the paths to the license files if the comment follows the 'standard' one closely enough.)

@mariolenz
Copy link
Contributor

  1. Always add the SPDX-License-Identifier: directly after the existing license identifier, as in:
    # Copyright: (c) 2022, ...
    # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
    # SPDX-License-Identifier: GPL-3.0-or-later
    
  2. Create a directory LICENSES/ with all licenses from the project in it, named according to their SPDX license identifiers.
  3. The main license (usually GPLv3) should be kept (also) as LICENSE or COPYING in the root directory (so that GitHub can easily identify and correctly show the license), and the corresponding license file in LICENSES/ should be made a relative symlink to that file.

Sounds good to me. But I'm not sure if you want this to be a recommendation or a requirement for collections. Personally, I would start with making this a recommendation first and a requirement later.

Additionally, we should have a way to test this. This would mean core would be involved and has to implement this in the sanity(?) tests. That is, if we don't want to implement our own tests for the community package. And I think we should try to avoid this.

@felixfontein
Copy link
Contributor Author

I think this would be a recommendation for at least some time. I think the main goal is to first get ansible-core and the steering committee (and hopefully larger parts of the community) to agree on such a recommendation, and then to see who else would adopt this. The steering committee has several members who are involved with some of the Ansible dev tooling (ansible-lint, ansible-navigator, ...), so I would expect that if the SC agrees on such a recommendations, these projects would also adopt this, at least over time. That's why I'd really like to see some more @ansible-community/steering-committee feedback on this.

@mariolenz
Copy link
Contributor

I think the main goal is to first get ansible-core and the steering committee (and hopefully larger parts of the community) to agree on such a recommendation, and then to see who else would adopt this. [...] That's why I'd really like to see some more @ansible-community/steering-committee feedback on this.

I might be be wrong, but I think this discussion hasn't been announced via Bullhorn yet. Maybe this would help to get more feedback from ansible-core, the steering committee and the community.

@felixfontein
Copy link
Contributor Author

We never announced new discussions on The Bullhorn except if we were looking for community feedback. Right now we are looking for feedback from the SC and from the core team, and both have already been pinged.

@Andersson007
Copy link
Contributor

My question is if we're going to standardize this way of organizing work with license info (i.e. it'll be a part of the Collection requirements if SC is OK)?

Regarding the questions:

  1. Always add the SPDX-License-Identifier: directly after the existing license identifier, as in:
    # Copyright: (c) 2022, ...
    # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
    # SPDX-License-Identifier: GPL-3.0-or-later
    

+1

  1. Create a directory LICENSES/ with all licenses from the project in it, named according to their SPDX license identifiers.

+1

  1. The main license (usually GPLv3) should be kept (also) as LICENSE or COPYING in the root directory (so that GitHub can easily identify and correctly show the license), and the corresponding license file in LICENSES/ should be made a relative symlink to that file.

+1

@mariolenz
Copy link
Contributor

1. Always add the `SPDX-License-Identifier:` directly after the existing license identifier, as in:
   ```
   # Copyright: (c) 2022, ...
   # GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
   # SPDX-License-Identifier: GPL-3.0-or-later
   ```

Should we keep referring to COPYING, or should we change this to:

# Copyright: (c) 2022, ...
# GNU General Public License v3.0+ (see LICENSES/GPL-3.0-or-later.txt or https://www.gnu.org/licenses/gpl-3.0.txt)
# SPDX-License-Identifier: GPL-3.0-or-later

I think it would be more consistent if the copyright always refers a file in LICENSES/.

My question is if we're going to standardize this way of organizing work with license info (i.e. it'll be a part of the Collection requirements if SC is OK)?

I think we should make this a recommendation first, but would love to see it as a requirement later.

@sivel
Copy link

sivel commented Jul 5, 2022

The Core team has discussed this, and have decided that ansible-core will not be adopting this proposal. We may adopt the change to rename the license files to their SPDX equivalent, but we would need to determine how the license file for GPL3.0+ would be named, since the license file itself does not dictate "only" vs "or later".

@felixfontein
Copy link
Contributor Author

@sivel would you mind adding a bit more details on a) why you don't want to include SPDX-License-Identifier, and b) why not rename licenses/ to LICENSES/? I'm not asking to convince the core team to change their opinions, but I'm curious to better understand what the reasons are not to do this.

@sivel
Copy link

sivel commented Jul 5, 2022

A few of the items I remember from our discussion...

  1. In the end we just agreed that adding SPDX-License-Identifier didn't offer a benefit for us
    1. We don't realistically benefit more on a per file basis over what we already have in place.
    2. We aren't open to blanket modification of all our files to include this
    3. The end result doesn't justify the work required to achieve it
  2. We didn't really discuss licenses/ -> LICENSES/. I see no problem with this, but I'm also not sure it really matters. This is kind of a bike shed topic.

We actually discussed a preference over removing all per file license headers where the file matched the project license, over adding more information per file. (This has not been discussed with Red Hat legal)

@felixfontein
Copy link
Contributor Author

@sivel thanks a lot for writing this up!

@mariolenz
Copy link
Contributor

@felixfontein Just because the Core team doesn't want to follow this proposal doesn't mean we can't make this a recommendation for collections, does it? I think it's a good proposal that might help a lot, no matter if Core will adopt this or not.

@felixfontein
Copy link
Contributor Author

@mariolenz that is definitely true. I'm still curious for feedback on this from some more SC members. If we do some recommendations regarding this topic, I tihnk we should have a large majority of SC behind it.

@Andersson007
Copy link
Contributor

My question is if we're going to standardize this way of organizing work with license info (i.e. it'll be a part of the Collection requirements if SC is OK)?

I think we should make this a recommendation first, but would love to see it as a requirement later.

Okay, SGTM, thanks for clarifying

@felixfontein
Copy link
Contributor Author

I created a PR which implements my proposal (#112 (comment)) for community.docker: ansible-collections/community.docker#430
For this I used https://gist.github.com/felixfontein/3d9502e22d1461dc3baadddeeb9d9cd9, and then pasted the GPL header to all files that had no header yet, and simplified some of the longer headers that were still there (in tests/).

softwarefactory-project-zuul bot pushed a commit to ansible-collections/community.vmware that referenced this issue Jul 16, 2022
Prepare adding SPDX-License-Identifier

SUMMARY
Prepare adding SPDX-License-Identifier
ISSUE TYPE

Docs Pull Request

COMPONENT NAME
ADDITIONAL INFORMATION
ansible-community/community-topics#112
@felixfontein
Copy link
Contributor Author

I added a tool which adds a 'default' license header to all files which don't have one to my gist: https://gist.github.com/felixfontein/3d9502e22d1461dc3baadddeeb9d9cd9#file-add-default-license-py
Use with caution and check the results before adding them :-) I also added an extra sanity check for SPDX-License-Identifier and the presence of Copyright to 'my' collections, see the links shown above.

Finally I noticed that galaxy-importer does not support the PSF-2.0 license (used by the vendored code for LooseVersion / StrictVersion); I created ansible/galaxy-importer#175 to address this.

softwarefactory-project-zuul bot pushed a commit to ansible-collections/community.vmware that referenced this issue Jul 20, 2022
…missing license file (#1398)

Use SPDX-License-Identifier, mention all licenses in galaxy.yml, add missing license file

SUMMARY
Implements ansible-community/community-topics#112 (comment).
Also adds a missing license file for the BSD-2-Clause licensed vmware_spbm and vmware_rest_client module utils.
ISSUE TYPE

Docs Pull Request
Feature Pull Request

COMPONENT NAME
collection
ADDITIONAL INFORMATION
I've also tried to unify the format. For example, sometimes there was an empty comment before the copyright (# ) and sometimes there was an empty line without a comment.
cc @felixfontein

Reviewed-by: Felix Fontein <[email protected]>
machacekondra pushed a commit to machacekondra/vmware.vmware that referenced this issue Mar 8, 2024
…missing license file (#1398)

Use SPDX-License-Identifier, mention all licenses in galaxy.yml, add missing license file

SUMMARY
Implements ansible-community/community-topics#112 (comment).
Also adds a missing license file for the BSD-2-Clause licensed vmware_spbm and vmware_rest_client module utils.
ISSUE TYPE

Docs Pull Request
Feature Pull Request

COMPONENT NAME
collection
ADDITIONAL INFORMATION
I've also tried to unify the format. For example, sometimes there was an empty comment before the copyright (# ) and sometimes there was an empty line without a comment.
cc @felixfontein

Reviewed-by: Felix Fontein <[email protected]>

This commit was initially merged in https://github.com/ansible-collections/community.vmware
See: ansible-collections/community.vmware@19b8ffe
@mariolenz
Copy link
Contributor

@felixfontein Please close this issue if done, or open a new forum topic and then close the issue with a pointer to the new discussion: Community-topics: Archiving the repo

@felixfontein
Copy link
Contributor Author

I think this is kind of done for now. There never has been consensus to use this everywhere. Some collections do use it, some others do not, Core does not use it, no idea about other parts of the ecosystem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants