Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 661 Bytes

README.md

File metadata and controls

12 lines (10 loc) · 661 Bytes

spaCy-talk-demo

Slides and Jupyter Notebook for PyData Vancouver talk (25-02-2020)

In this talk, we first review spaCy's basic usage:

  • Loading models for different languages.
  • Identifying special tokens: punctuation, stopwords, etc.
  • Compute part-of-speech (POS) tags, dependency parsing, named entity recognition (NER).
  • Visualize dependencies and named entities, along with tag explanations.

In the second half, we take a stab at customizing spaCy pipelines to solve two tasks:

  • Create a tokenizer that does not split hyphenated words.
  • Create a simple noun chunker that identifies only consecutive nouns, optionally preceded by a single adjective.