💫Fix SP tag, tweak Vectors.__init__, fix Morphology #1442
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Re #1052, #683
This patch mixes a few changes that aren't obviously related, but ended up having inter-dependencies. Not wonderful.
The main motivation was to fix the requirement that each tag map must specify the
SP
tag. This tag is used internally by spaCy, so it's weird to make the data provide it. However, previous efforts to add it automatically threw the class ID mapping out for the tagger, becauseSP
was inserted somewhere in the middle of the tag list.To address this, I've renamed the
SP
tag to_SP
, to denote it's a hard-coded value. This also means it sorts to the end of the ordinary tags.While making this change I came across a bug in the way the
Vectors
class would be created when there were strings, but no vocabulary items. In fixing this, I also fixed a wart in theVectors.__init__
API: instead of having one argumentdata_or_width
, there's no two keyword arguments.Finally, making changes to
morphology.pyx
led me to an issue that caused very slow compilation of this module. The problem was due to acpdef enum
declaration inmorphology.pxd
, which causes a lot of code to be generated in recent versions of Cython.Clearly I should've made some of these patches on other branches. I'll improve my branch bureaucracy in future.
Types of changes
Checklist: