NLP-packages

List of packages developed with focus on natural language processing.

Python

aitextgen - A robust Python tool for text-based AI training and generation using GPT-2 [site].
AllenNLP - An open-source NLP research library, built on PyTorch [site].
BERTopic - Leveraging BERT and c-TF-IDF to create easily interpretable topics [site].
BigARTM - Fast topic modeling platform [site].
ChatterBot - a machine learning, conversational dialog engine for creating chat bots [site].
clean-text - package for text cleaning.
cltk - The Classical Language Toolkit [site].
ColossalAI - Making large AI models cheaper, faster and more accessible [site]
conTEXT-explorer - open Web-based system for exploring and visualizing concepts (combinations of occurring words and phrases) over time in the text documents.
contextualized-topic-models - package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics.
DeText - A Deep Neural Text Understanding Framework for Ranking and Classification Tasks.
dl-translate - A deep learning-based translation library built on Huggingface transformers.
ecco - Explain, analyze, and visualize NLP language models [site].
flair - A very simple framework for state-of-the-art Natural Language Processing (NLP).
flashtext - Extract Keywords from sentence or Replace keywords in sentences.
ftfy - ftfy (fixes text for you) fixes mojibake and other glitches in Unicode text, after the fact [site].
gluon-nlp - A toolkit that helps you solve NLP problems [site].
Gensim - Topic Modelling for Humans [site].
Gramformer - A framework for detecting, highlighting and correcting grammatical errors on natural language text.
HanLP - The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry [site].
haystack - framework to interact with your data using Transformer models and LLMs [site].
interpret-text - A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard.
intertext - Detect and visualize text reuse [site].
jury - Comprehensive NLP Evaluation System.
ktrain - library that makes deep learning and AI more accessible and easier to apply.
langchain - Building applications with LLMs through composability.
llama-cpp-python - Python bindings for llama.cpp.
lexical_diversity - package for calculating a variety of lexical diversity indices.
lexrank - LexRank algorithm for text summarization.
multi_rake - Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python.
Multilingual Latent Dirichlet Allocation - A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
multiplex-plot - A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.
neattext - a simple NLP package for cleaning textual data and text preprocessing.
news-graph - Key information extraction from text and graph visualization.
NLTK - Natural Language Toolkit [site].
NLP-Cube - Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing [site].
nlpaug - Data augmentation for NLP.
nlpnet - A neural network architecture for NLP tasks, using cython for fast performance. Currently, it can perform POS tagging, SRL and dependency parsing.
nlu - 1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems [site].
OpenKiwi - Open-Source Machine Translation Quality Estimation in PyTorch [site].
ParlAI - A framework for training and evaluating AI models on a variety of openly available dialogue datasets [site].
Parrot - A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines.
pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. [wiki].
polyglot - Multilingual text (NLP) processing toolkit [site].
pyhunspell - Python bindings for the Hunspell spellchecker engine.
PyNLPl - Python Natural Language Processing Library.
pysentimiento - Multilingual toolkit for Sentiment Analysis and Social NLP tasks.
PySS3 - A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI) [site].
pytextrank - Python implementation of TextRank algorithms for phrase extraction [site].
PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing [site].
pywsd - Implementations of Word Sense Disambiguation (WSD) Technologies.
rasa - Open source machine learning framework to automate text- and voice-based conversations [site].
rosetta - Tools and wrappers for data science with a concentration on text processing.
scattertext - Beautiful visualizations of how language differs among document types.
sense2vec - Contextually-keyed word vectors [site].
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT [site].
sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
small-text - Active Learning for Text Classification in Python.
spaCy - Industrial-strength Natural Language Processing (NLP) in Python. [site].
spacy-stanza - Use the latest Stanza (StanfordNLP) research models directly in spaCy.
Spark NLP - State of the Art Natural Language Processing [site].
Stanza - Official Stanford NLP Python Library for Many Human Languages [site].
texar - Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.
TextAttack - a Python framework for adversarial attacks, data augmentation, and model training in NLP [site].
textacy - NLP, before and after spaCy [site].
TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more [site].
TextBox - a text generation library with pre-trained language models.
TextFeatureSelection - library for feature selection for text features.
texthero - Text preprocessing, representation and visualization from zero to hero [site].
textkit - Command line tool for manipulating and analyzing text [site].
textnets - Text analysis with networks [site].
textplot - maps of texts with kernel density estimation and force-directed networks.
textstat - python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
text2text - Crosslingual NLP/G toolkit.
tomotopy - Python package of Tomoto, the Topic Modeling Tool [site].
topic modelling tools - Topic Modelling with Latent Dirichlet Allocation using Gibbs sampling.
torchtext - Data loaders and abstractions for text and NLP [site].
trankit - a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing.
transformers - Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX [site].
txtai - Build AI-powered semantic search applications [site].
verbecc - Complete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian.
wordfreq - Access a database of word frequencies, in various natural languages.
wordseer - text analysis tool, written in Flask [site].
wordtree - A Python library for generating word tree diagrams.

R

BTM - Biterm Topic Models for Short Text [cran].
cleanNLP - Package providing annotators and a normalized data model for natural language processing [cran].
CRAN Task View: Natural Language Processing.
corporaexplorer - An R package for dynamic exploration of text collections [cran], [site].
dfrtopics - package for exploring topic models of text.
koRpus - An R Package for Text Analysis [cran].
hunspell - High-Performance Stemmer, Tokenizer, and Spell Checker for R [cran], [site].
languageR - Analyzing Linguistic Data.
lda - Collapsed Gibbs Sampling Methods for Topic Models.
ldatuning - LDA models parameters tuning [cran].
lsa - Latent Semantic Analysis.
NLP - Basic classes and methods for Natural Language Processing.
openNLP - An interface to the Apache OpenNLP tools.
pattern.nlp - R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian.
quanteda - package for the Quantitative Analysis of Textual Data [cran], [site].
RKEA - interface to KEA (Keyphrase Extraction Algorithm).
r-corpus - Text corpus analysis in R [cran].
RMallet - An R Wrapper for the Java Mallet Topic Modeling Toolkit [cran].
sentencepiece - R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece [cran].
SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library.
spacyr - R wrapper to spaCy NLP [cran], [site].
stm - Estimation of the Structural Topic Model [cran], [site].
stopwords - Multilingual Stopword Lists in R [cran], [site].
stringi - Fast and portable character string processing in R (with the Unicode ICU) [cran], [site].
stringr - A fresh approach to string manipulation in R [cran], [site].
tau - Text Analysis Utilities.
Text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R [cran], [site].
textnets - R package to perform automated text analysis using network techniques.
textplot - unctionalities to easily visualise complex relations in texts [cran].
textplot - Plotting for text data.
textreuse - Detect text reuse and document similarity [cran], [site].
tidytext - Text mining using tidy tools [cran], [site].
tm - A framework for text mining applications within R.
tokenizers - Fast, Consistent Tokenization of Natural Language Text [cran], [site].
topicdoc - Topic-Specific Diagnostics for LDA and CTM Topic Models [cran], [site].
topicmodels - Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM).
udpipe - package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit [cran], [site].
wordcloud - Functionality to create pretty word clouds, visualize differences and similarity between documents, and avoid over-plotting in scatter plots with text.
wordnet - WordNet Interface.
wordVectors - package for building and exploring word embedding models.
zipfR - Statistical Models for Word Frequency Distributions [site].

Julia

CorpusLoaders - A variety of loaders for various NLP corpora.
Embeddings - Functions and data dependencies for loading various word embeddings (Word2Vec, FastText, GLoVE).
Languages - A package for working with human languages.
Snowball - Snowball stemming algorithms.
StringAnalysis - Hard-Forked from JuliaText/TextAnalysis.jl.
TextAnalysis - Julia package for text analysis.
TextModels - Neural Network based models for Natural Language Processing.
WordLists - Dictionaries without definitions.
WordNet - A Julia package for Princeton's WordNet.
WordTokenizers - High performance tokenizers for natural language processing and other related tasks.
Word2Vec - Julia interface to word2vec.

JavaScript

compromise - modest natural-language processing [site].
natural - general natural language facilities for node.
nlp.js - An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and more.
wink-nlp-utils - NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more [site].

Java

CoreNLP - Stanford CoreNLP: A Java suite of core NLP tools [site].
Mallet - package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text [site].
OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. [site].

Go

NLP awesome Go list.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-packages

Python

R

Julia

JavaScript

Java

Go

About

ajdavidl/NLP-packages

Folders and files

Latest commit

History

Repository files navigation

NLP-packages

Python

R

Julia

JavaScript

Java

Go

About

Topics

Resources

Stars

Watchers

Forks