-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question on license of models #3
Comments
I think they are under Apache License but am not totally sure. As far as I know the models are just parameter settings and can't be used to recreate the training data. But I'll ask the team about this and get back to you. |
If think that if the UD treebank is https://creativecommons.org/licenses/by-sa/4.0, the model also should be distributed under that license. The model is a reproduction of the UD database so it seems to me adapted material which falls under that CC-BY-SA license. Correct me if I'm wrong here. |
@jwijffels We're still trying to figure out the exact licensing details for the models themselves (knowledge/precedents/best practices on model sharing seem scarce at this point), but in the meantime, we have added the treebank licenses to our model download table here for anyone interested. |
Fyi, that is also the approach I used at https://github.com/bnosac/udpipe.models.ud and also the approach spacy uses for all its models based on ud. This one uses another approach https://github.com/datquocnguyen/RDRPOSTagger - which I personally think is wrong license-wise. The udpipe C++ authors release all their models under CC-BY-SA-NC (https://github.com/ufal/udpipe). |
Let me put the relevant parts of the CC-BY license below https://creativecommons.org/licenses/by-sa/4.0/legalcode - which most treebanks have as a license
The list of CC-BY compatible licenses is here: https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/. Apache License is not one of them. |
I got a stupid question about the licenses. The treebank I am interested is released under |
Thanks, everyone, for their interest and thoughts on this question! I have no legal training, but have spent a fair while reading about copyright, open source, and creative commons licenses over the years on various projects. My best understanding is that there is at present no very clear answer to what the status of machine learning models trained from (variously licensed) underlying datasets is. As far as I know, there really isn't any clear, very similar existing case law. At most you have analogies from quite distant cases (Sega Genesis, anyone?) And to the extent that there is relevant precedent, its implications would likely vary according to the geographical region of the user, since copyright laws and recognition of database rights and moral rights vary significantly. I'm aware of only two relevant published articles on this topic that are (co-)authored by people with legal training, so they're probably the best source of info:
I encourage everyone to read the full articles (!) but I think it is fair to summarize that the first one suggests that likely all ML model building on top of text corpora is okay, and there are no inherited legal restrictions, while the second is more wide-ranging and ambivalent for the full range of machine learning but pretty much concludes that the kind of non-expressive uses of ML that we are considering with parsing models likely do not violate copyright (while the situation may well be different for expressive uses, such as text, image, and music generation). Here are a couple of other relevant web pages for lighter reading:
In particular, relative to @jwijffels comments: It's just not clear whether the parts you cite apply to an ML model like a dependency parser model. As machine-generated and machine-read files of words and numbers, it's not at all clear that these models are "material subject to Copyright". If not, there is no requirement. Even if they were subject to copyright, at least U.S. courts have generously interpreted a category of non-expressive (transformative) fair use, which would likely cover the creation and use of these models. Note in particular that not even reasonable length snippets of the original works can be recovered from our model files. So, I think for the moment our position is:
Finally, I should probably emphasize that, while I am the Stanford faculty directing this project, everything written above is my own best understanding, and is not an official legal position of Stanford University. |
Hi,
I've got a question on the license of the models.
The UD treebanks are distributed under different licenses depending on each treebank (e.g. CC-BY-SA / CC-BY-NC-SA / some LGPL / ...)
Under what license do you distribute the models (which basically allow mimicing the UD databases)? Is that the same license of the UD treebank?
The text was updated successfully, but these errors were encountered: