Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💫Fix SP tag, tweak Vectors.__init__, fix Morphology #1442

Merged
merged 8 commits into from
Oct 24, 2017

Conversation

honnibal
Copy link
Member

@honnibal honnibal commented Oct 20, 2017

Re #1052, #683

This patch mixes a few changes that aren't obviously related, but ended up having inter-dependencies. Not wonderful.

The main motivation was to fix the requirement that each tag map must specify the SP tag. This tag is used internally by spaCy, so it's weird to make the data provide it. However, previous efforts to add it automatically threw the class ID mapping out for the tagger, because SP was inserted somewhere in the middle of the tag list.

To address this, I've renamed the SP tag to _SP, to denote it's a hard-coded value. This also means it sorts to the end of the ordinary tags.

While making this change I came across a bug in the way the Vectors class would be created when there were strings, but no vocabulary items. In fixing this, I also fixed a wart in the Vectors.__init__ API: instead of having one argument data_or_width, there's no two keyword arguments.

Finally, making changes to morphology.pyx led me to an issue that caused very slow compilation of this module. The problem was due to a cpdef enum declaration in morphology.pxd, which causes a lot of code to be generated in recent versions of Cython.

Clearly I should've made some of these patches on other branches. I'll improve my branch bureaucracy in future.

Types of changes

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality to spaCy)
  • Breaking change (fix or feature causing change to spaCy's existing functionality)
  • Documentation (addition to documentation of spaCy)

Checklist:

  • My change requires a change to spaCy's documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
@ines ines changed the title Fix SP tag, tweak Vectors.__init__, fix Morphology 💫Fix SP tag, tweak Vectors.__init__, fix Morphology Oct 21, 2017
@ines ines added the 🌙 nightly Discussion and contributions related to nightly builds label Oct 21, 2017
@honnibal honnibal merged commit ef3e5a3 into develop Oct 24, 2017
@ines ines deleted the feature/fix-sp branch October 25, 2017 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌙 nightly Discussion and contributions related to nightly builds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants