-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the best way to add custom attributes to Tokens? #860
Comments
Hey, The API for this is relatively recent. You want the import spacy
def get_user_id(token):
return token.doc.get((token.i, u'user_id'))
def set_user_id(token, value):
token.doc.user_id[(token.i, u'user_id')] = value
nlp = spacy.load('en')
doc = nlp(u'I like billy90210')
doc.user_data[(2, 'user_id')] = u'e7f67231'
for token in doc:
print(token.text, get_user_id(token)) To make the functionality feel more "native", we'd like to add a property to the token. Unfortunately there's no generic support for this in the code atm. This should probably change --- we should probably allow you to use the Python descriptor protocol, so you can write a custom getter/setter. The simple case of this sort of key association should also be covered. One solution would be for you to just compile a fork, with the attribute hard-coded onto the Token. This isn't a bad solution, as your changes will almost surely merge cleanly each time. Another solution would be to subclass the Finally, a thing to note: don't write directly onto the |
Thanks, Matt. So, till this becomes more native, E.g., it doesn't look from the source that anything special is done with And thanks for the tip on views to the |
Correct, it's a free-form dict. It's also not serialised at the moment, unfortunately. |
Thank you. Will close this for now and keep an eye out for tighter integration into tokens. |
Is there another way to do this now? Say I want to add my own word embeddings to a token, preferably as a property that makes a lookup in my matrix. I would like to have something like:
This use case does not seem to go well with the Edit: |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
We need to add domain-specific annotations to Token instances after spaCy's parsing. Being able to add attributes to the Token class so that we can continue to use spaCy's Doc/Token/Span/etc. constructs would be very clean.
Is this feasible to do from within Python given that Tokens are cdefs in cython? If not, is there another way to achieve something similar?
The text was updated successfully, but these errors were encountered: