This release include support for tagging, parsing, tokenizing, sentence splitting and lemmatizing of raw text.
It was evaluated during the CONLL Shared Task on Universal Dependencies Parsing and has pretrained languages models for the entire UD Corpus.
Features
- Model store with pretrained (selected) languages
- Training pipeline for building custom models
- Supports multiple language models: transformer, fasttext, languasito, dummy (no embeddings)
- Updated models with large improvements in the F-Score
- Flavours: build a joint model using multiple treebanks at the same time and language code conditioning (increses performance in most cases)