Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Make sure that for datasets published with CC0 waiver or standard license, metadata exports include the waiver or license #8798

Open
jggautier opened this issue Jun 13, 2022 · 8 comments
Labels
Feature: Metadata Feature: Terms & Licensing FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) GREI 2 Consistent Metadata Size: 10 A percentage of a sprint. 7 hours. UX/UI Input Needed Apply to issues involving UX or UI implications that need additional input

Comments

@jggautier
Copy link
Contributor

Overview of the Feature Request
Make sure that for datasets published with a CC0 waiver or a standard license, their metadata exports include the waiver or license.

What kind of user is the feature intended for?
Curator, Guest

What inspired the request?
While reviewing the metadata exports of datasets published in repositories running a Dataverse software version that includes the "multiple license" update, I noticed that for certain datasets published before that update, the metadata exports don't include the CC0 waiver or other standard license.

For example, after the multiple license update, datasets published before that update with a CC0 waiver and metadata entered in one or more "Terms of Use" fields were updated to include text in the Terms of Use field that read that "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions":

Screen Shot 2022-06-13 at 12 34 28 PM

But the OpenAIRE and Schema.org metadata exports of those datasets do not include information about the CC0 waiver:

  • OpenAIRE export of dataset with CC0 waiver plus metadata entered in one or more "Terms of Use" fields
  • Schema.org export of dataset with CC0 waiver plus metadata entered in one or more "Terms of Use" fields

For comparison, see the OpenAIRE and Schema.org metadata exports of a dataset with a CC0 waiver and nothing entered in any of the "Terms of Use" fields:

  • OpenAIRE export of dataset with CC0 waiver and no metadata entered in one or more "Terms of Use" fields
    Screen Shot 2022-06-13 at 12 43 14 PM

  • Schema.org export of dataset with CC0 waiver and no metadata entered in one or more "Terms of Use" fields
    Screen Shot 2022-06-13 at 12 42 50 PM

What existing behavior do you want changed?
For datasets that were published with a CC0 waiver (or possibly a CC-BY license for some Dataverse repositories whose default license was CC-BY), include in their metadata exports information about those waivers or licenses. This will become more important as license metadata is included in more exports (e.g. being sent to DataCite (#5889)) and is indexed and made searchable (e.g. #7482).

@qqmyers
Copy link
Member

qqmyers commented Jun 13, 2022

FWIW: The design discussion around this was specifically to not reference the standard license (or consider the dataset to be using that license) when there are custom terms (any field that appears when you pick custom terms).

I think the schema.org example is what was intended i.e. there is still a "license": "https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.7910/DVN/WHNXKY", link in the metadata. That should be similar for other formats where the URI was sent before - we now send the custom terms/licenseURI.

For other metadata exports, I think we removed places where CC0 was hardcoded and/or where it wasn't obvious how map custom terms/the custom license URI into the format. (I think that was noted somewhere as 'future work' but I'm not aware of anyone having taken it up.)

W.r.t. specific datasets, in cases where some files are restricted/embargoed and the terms of use/other custom terms are just about the restricted files, it may work to just put those terms on the Terms of Access for Restricted Files field - you could then have a CC0 dataset with those files having the additional Terms of Access. For cases like you have here, which only has restricted files, I'm not sure that would actually help - the metadata is already open so saying the dataset is CC0 but you can't actually get any files under CC0 terms may not make sense. (FWIW: At QDR, we opted for a 'QDR Standard license' that codifies that documentation files are CC0 and other files (that are restricted or embargoed) are subject to QDR's terms.)

@jggautier
Copy link
Contributor Author

jggautier commented Oct 18, 2022

Thanks @qqmyers. I think it might be helpful if I look at cases where a CC0 waiver was applied and text was entered in one of the fields in the "Dataset Terms" accordion", but there are no restricted files. This is kind of just a to-do note for myself. (Of course others who read this issue and would like to review and contribute other use cases should feel free.)

One of my concerns is that if people are looking for datasets with CC0 waivers, maybe using search facets (#9060), or in some other way looking for datasets with no or very few use restrictions, datasets like https://doi.org/10.7910/DVN/WHNXKY won't show up. I think the same might be true for datasets with a mix of standard licenses referenced in a "custom license". Will people want to search for datasets whose files have a CC0 waiver, even if some of those datasets' files have other licenses? Would they be able to?

Like you said, what's been entered in the Special Permissions field could be moved to the "Terms of Access for Restricted Files" field. Does that mean that any information about data access should be put in the "Terms of Access for Restricted Files" field or one of the other fields in the "Restricted Files + Terms of Access" accordion?:

Screen Shot 2022-10-18 at 12 44 55 PM

Is the Dataverse software's model then that if a depositor wants to add information about data use, depositors should add that information in fields in the "Dataset Terms" accordion? And information about data access should go in the "Restricted Files + Terms of Access" fields?

The "Special Permissions" field is one of the fields in the "Dataset Terms" accordion and that field's description is "Determine if any special permissions are required to access a resource (e.g., if form is a needed and where to access the form)". That seems more about data access. The same seems true for the "Confidentiality Declaration" field.

To maintain the model, should those fields be moved to the group of fields in the "Terms of Access for Restricted Files" accordion?

What about the fields in that "Terms of Access for Restricted Files" accordion that mention both data use and access?

My other concern is something I've also heard from others in the community, that in 5.10+ installations, when depositors choose a standard license from the new dropdown, there's no longer a place for them to put information that they would normally put in fields in the "Dataset Terms" accordion because those fields disappear. I think @philippconzett made a related comment in some channel recently (a GitHub issue, Slack, Google Group?), though I can't find it now. And I've also heard about related complications from the Harvard Dataverse repo's curation team.

I think the community should invest time in reviewing how the functionality is working, and I hope these questions could help provide some direction.

@philippconzett
Copy link
Contributor

Julian, thanks for raising this issue. You're right, I have earlier commented on this, but I can't find my comment either. The comment was about the following:

When researchers have reused data from other sources and want to publish a dataset that is somehow derived from or builds on these other source, it is good practice to describe your sources and also under what Terms of Use or licenses they were (re)used. For CC BY licenses, this is even a legal requirement.

We're still running on versjon 5.6 of Dataverse, so when choosing another license than CC0, we add the Terms of Use or the standard license into the Terms of Use field in the Terms of Use tab. Into the same field, we add a description of which Terms and Use or licenses the different sources were used. See, e.g., this dataset: https://doi.org/10.18710/VMUP44. Once we implement standard license support, we won't be able to add any text to the Terms of Use field anymore. So, the question is where to put this information. The metadata field Data Sources could be an alternative, which we already have used in the Terms of Use clean-up to prepare the implementation of standard license support. But as you say, the community should invest time in agreeing on a best practice for this.

@sbarbosadataverse
Copy link

sbarbosadataverse commented Apr 24, 2024

Investigate, prioritize and size*
Who does this investigating?

@sbarbosadataverse
Copy link

also related: #8796

@cmbz cmbz added the GREI 2 Consistent Metadata label Apr 29, 2024
@cmbz
Copy link

cmbz commented May 8, 2024

2024/05/08

  • There is a current issue that may address the OpenAIRE export part of this issue (see: Align or merge DataCite metadata exports #5889)
  • Main task is to identify the specific programming task needed; small internal group to start (Design Meeting)
  • Issue could be split into parts some which could involve more community discussion

@cmbz cmbz added UX/UI Input Needed Apply to issues involving UX or UI implications that need additional input Size: 10 A percentage of a sprint. 7 hours. labels May 8, 2024
@cmbz
Copy link

cmbz commented Jun 20, 2024

2024/06/20

  • Next steps, needs additional review of work

@cmbz
Copy link

cmbz commented Aug 28, 2024

Depends on: #10632. It must be merged first.

@pdurbin pdurbin added the Status: Waiting for Related Issues/PRs This issue depends upon the completion of one or more issues/PRs label Sep 11, 2024
@pdurbin pdurbin removed the Status: Waiting for Related Issues/PRs This issue depends upon the completion of one or more issues/PRs label Sep 25, 2024
@cmbz cmbz added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Sep 26, 2024
@sekmiller sekmiller assigned sekmiller and unassigned sekmiller Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Feature: Terms & Licensing FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) GREI 2 Consistent Metadata Size: 10 A percentage of a sprint. 7 hours. UX/UI Input Needed Apply to issues involving UX or UI implications that need additional input
Projects
Status: No status
Status: ⚠️ Needed/Important
Development

No branches or pull requests

8 participants