Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on license of models #3

Closed
jwijffels opened this issue Jan 31, 2019 · 7 comments
Closed

question on license of models #3

jwijffels opened this issue Jan 31, 2019 · 7 comments
Labels

Comments

@jwijffels
Copy link

Hi,

I've got a question on the license of the models.
The UD treebanks are distributed under different licenses depending on each treebank (e.g. CC-BY-SA / CC-BY-NC-SA / some LGPL / ...)
Under what license do you distribute the models (which basically allow mimicing the UD databases)? Is that the same license of the UD treebank?

@J38
Copy link
Collaborator

J38 commented Jan 31, 2019

I think they are under Apache License but am not totally sure. As far as I know the models are just parameter settings and can't be used to recreate the training data. But I'll ask the team about this and get back to you.

@jwijffels
Copy link
Author

If think that if the UD treebank is https://creativecommons.org/licenses/by-sa/4.0, the model also should be distributed under that license. The model is a reproduction of the UD database so it seems to me adapted material which falls under that CC-BY-SA license. Correct me if I'm wrong here.

@qipeng
Copy link
Collaborator

qipeng commented Feb 7, 2019

@jwijffels We're still trying to figure out the exact licensing details for the models themselves (knowledge/precedents/best practices on model sharing seem scarce at this point), but in the meantime, we have added the treebank licenses to our model download table here for anyone interested.

@jwijffels
Copy link
Author

jwijffels commented Feb 7, 2019

Fyi, that is also the approach I used at https://github.com/bnosac/udpipe.models.ud and also the approach spacy uses for all its models based on ud. This one uses another approach https://github.com/datquocnguyen/RDRPOSTagger - which I personally think is wrong license-wise. The udpipe C++ authors release all their models under CC-BY-SA-NC (https://github.com/ufal/udpipe).
I personally think that releasing under the same License as the ud treebank where it was built upon is correct.

@qipeng qipeng added the question label Feb 8, 2019
@jwijffels
Copy link
Author

jwijffels commented Feb 8, 2019

Let me put the relevant parts of the CC-BY license below https://creativecommons.org/licenses/by-sa/4.0/legalcode - which most treebanks have as a license

  • Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material

  • Section 3.b: ShareAlike.

In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply.

The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License.

The list of CC-BY compatible licenses is here: https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/. Apache License is not one of them.
So I think it is pretty fine to release code which you use to train the models under Apache License but the models itself should be released with a license which is either the same as the treebank license or a CC-BY compatible license which is always copyleft (so not Apache).
If the model is built on a treebank which is released under CC-BY-SA-NC, that means also the derived work (the models) should be non-commercial.

@ccliu2
Copy link

ccliu2 commented Mar 19, 2019

I got a stupid question about the licenses. The treebank I am interested is released under CC-BY-SA-NC. I am affiliated with an commercial entity, but I am interested to use the model/treebank to create work that could be a part of research publication. Is this considered as fair use, or non-commercial use?

@manning
Copy link
Member

manning commented May 29, 2019

Thanks, everyone, for their interest and thoughts on this question! I have no legal training, but have spent a fair while reading about copyright, open source, and creative commons licenses over the years on various projects.

My best understanding is that there is at present no very clear answer to what the status of machine learning models trained from (variously licensed) underlying datasets is. As far as I know, there really isn't any clear, very similar existing case law. At most you have analogies from quite distant cases (Sega Genesis, anyone?) And to the extent that there is relevant precedent, its implications would likely vary according to the geographical region of the user, since copyright laws and recognition of database rights and moral rights vary significantly.

I'm aware of only two relevant published articles on this topic that are (co-)authored by people with legal training, so they're probably the best source of info:

I encourage everyone to read the full articles (!) but I think it is fair to summarize that the first one suggests that likely all ML model building on top of text corpora is okay, and there are no inherited legal restrictions, while the second is more wide-ranging and ambivalent for the full range of machine learning but pretty much concludes that the kind of non-expressive uses of ML that we are considering with parsing models likely do not violate copyright (while the situation may well be different for expressive uses, such as text, image, and music generation).

Here are a couple of other relevant web pages for lighter reading:

In particular, relative to @jwijffels comments: It's just not clear whether the parts you cite apply to an ML model like a dependency parser model. As machine-generated and machine-read files of words and numbers, it's not at all clear that these models are "material subject to Copyright". If not, there is no requirement. Even if they were subject to copyright, at least U.S. courts have generously interpreted a category of non-expressive (transformative) fair use, which would likely cover the creation and use of these models. Note in particular that not even reasonable length snippets of the original works can be recovered from our model files.

So, I think for the moment our position is:

  1. We will provide information on the copyright/licensing of each underlying treebank.
  2. We will state that to the extent that we have ownership/rights over a language packs, it is made available under the Open Data Commons Attribution License v1.0.

Finally, I should probably emphasize that, while I am the Stanford faculty directing this project, everything written above is my own best understanding, and is not an official legal position of Stanford University.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants