-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set vector of merged entity #5085
set vector of merged entity #5085
Conversation
@svlandeg: do you think this is something the retokenizer should handle more generally? |
Yea, perhaps? I was doubting where to put it. You think it would be better moved to the |
I think it might make more sense to have it near spaCy/spacy/tokens/_retokenize.pyx Lines 197 to 202 in 2281c47
I was a little worried initially that it would be tricky to add new vectors if the table is full, but I see that it gets resized automatically. One downside if it's not optional is that the size of the vectors could keep growing and growing, e.g., in a situation like #5083? I'm not sure this is a major concern, but I was just looking at all the places that can grow... |
Ok I moved it. The issue of the growing |
Closes #5082
Description
When using the
merge_entities
pipe, the vector of the mergedtoken
is now appropriately set to the vector of the originalspan
(i.e. the average of the original tokens). Before it was just a zero vector.Types of change
enhancement
Checklist