Skip to content

Kyrgyz language processing software, models and datasets.

Notifications You must be signed in to change notification settings

golden-ratio/awesome-kyrgyz-nlp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Kyrgyz NLP Awesome

A curated list of awesome Kyrgyz language processing software, models and datasets. Inspired by awesome-ML.

The main focus is on open source tools, downloadable data and research papers with code.

If you want to contribute to this list (please do), send me a pull request. Also, a listed repository should be tagged as deprecated if:

  • Repository's owners explicitly say that "this library is not maintained".
  • Not committed to for a long time (2~3 years).

Table of Contents

Datasets

The repository currently consists of 80213 (50x50 pixel) images representing all 36 letters of the Kyrgyz alphabet These images have been hand-written.

Raw text

  • kloop corpus: 16'826 articles (sqlite3 DB file) + crawler code

Several corpora are also mentioned in research works:

  • TODO

Syntax

Machine-readable dictionaries

Pretrained models

  • Polyglot morfessor — pretrained morfessor model, number 6
  • fastText — 300-dimensional fastText vectors provided by the authors: bin, txt.
  • BERT-based NERbert-base-multilingual-cased fine-tuned on Wikiann for NER on Kyrgyz. The author warns that this model is not usable and is built just as a proof of concept. Will be updated later.

Methods/Software

  • spaCy basic support: tokenization, stopwords, like_num

Morphology

Mentioned in papers:

  • TODO

Hate Speech detection

Other

Online Demos

Miscellaneous

About

Kyrgyz language processing software, models and datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 87.8%
  • Python 12.2%