Skip to content

Latest commit

 

History

History
169 lines (155 loc) · 21.9 KB

README.md

File metadata and controls

169 lines (155 loc) · 21.9 KB

NLP-packages

List of packages developed with focus on natural language processing.

Python

  • aitextgen - A robust Python tool for text-based AI training and generation using GPT-2 [site].
  • AllenNLP - An open-source NLP research library, built on PyTorch [site].
  • BERTopic - Leveraging BERT and c-TF-IDF to create easily interpretable topics [site].
  • BigARTM - Fast topic modeling platform [site].
  • ChatterBot - a machine learning, conversational dialog engine for creating chat bots [site].
  • clean-text - package for text cleaning.
  • cltk - The Classical Language Toolkit [site].
  • ColossalAI - Making large AI models cheaper, faster and more accessible [site]
  • conTEXT-explorer - open Web-based system for exploring and visualizing concepts (combinations of occurring words and phrases) over time in the text documents.
  • contextualized-topic-models - package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics.
  • DeText - A Deep Neural Text Understanding Framework for Ranking and Classification Tasks.
  • dl-translate - A deep learning-based translation library built on Huggingface transformers.
  • ecco - Explain, analyze, and visualize NLP language models [site].
  • flair - A very simple framework for state-of-the-art Natural Language Processing (NLP).
  • flashtext - Extract Keywords from sentence or Replace keywords in sentences.
  • ftfy - ftfy (fixes text for you) fixes mojibake and other glitches in Unicode text, after the fact [site].
  • gluon-nlp - A toolkit that helps you solve NLP problems [site].
  • Gensim - Topic Modelling for Humans [site].
  • Gramformer - A framework for detecting, highlighting and correcting grammatical errors on natural language text.
  • HanLP - The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry [site].
  • haystack - framework to interact with your data using Transformer models and LLMs [site].
  • interpret-text - A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard.
  • intertext - Detect and visualize text reuse [site].
  • jury - Comprehensive NLP Evaluation System.
  • ktrain - library that makes deep learning and AI more accessible and easier to apply.
  • langchain - Building applications with LLMs through composability.
  • llama-cpp-python - Python bindings for llama.cpp.
  • lexical_diversity - package for calculating a variety of lexical diversity indices.
  • lexrank - LexRank algorithm for text summarization.
  • multi_rake - Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python.
  • Multilingual Latent Dirichlet Allocation - A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
  • multiplex-plot - A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.
  • neattext - a simple NLP package for cleaning textual data and text preprocessing.
  • news-graph - Key information extraction from text and graph visualization.
  • NLTK - Natural Language Toolkit [site].
  • NLP-Cube - Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing [site].
  • nlpaug - Data augmentation for NLP.
  • nlpnet - A neural network architecture for NLP tasks, using cython for fast performance. Currently, it can perform POS tagging, SRL and dependency parsing.
  • nlu - 1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems [site].
  • OpenKiwi - Open-Source Machine Translation Quality Estimation in PyTorch [site].
  • ParlAI - A framework for training and evaluating AI models on a variety of openly available dialogue datasets [site].
  • Parrot - A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines.
  • pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. [wiki].
  • polyglot - Multilingual text (NLP) processing toolkit [site].
  • pyhunspell - Python bindings for the Hunspell spellchecker engine.
  • PyNLPl - Python Natural Language Processing Library.
  • pysentimiento - Multilingual toolkit for Sentiment Analysis and Social NLP tasks.
  • PySS3 - A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI) [site].
  • pytextrank - Python implementation of TextRank algorithms for phrase extraction [site].
  • PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing [site].
  • pywsd - Implementations of Word Sense Disambiguation (WSD) Technologies.
  • rasa - Open source machine learning framework to automate text- and voice-based conversations [site].
  • rosetta - Tools and wrappers for data science with a concentration on text processing.
  • scattertext - Beautiful visualizations of how language differs among document types.
  • sense2vec - Contextually-keyed word vectors [site].
  • sentence-transformers - Multilingual Sentence & Image Embeddings with BERT [site].
  • sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
  • small-text - Active Learning for Text Classification in Python.
  • spaCy - Industrial-strength Natural Language Processing (NLP) in Python. [site].
  • spacy-stanza - Use the latest Stanza (StanfordNLP) research models directly in spaCy.
  • Spark NLP - State of the Art Natural Language Processing [site].
  • Stanza - Official Stanford NLP Python Library for Many Human Languages [site].
  • texar - Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.
  • TextAttack - a Python framework for adversarial attacks, data augmentation, and model training in NLP [site].
  • textacy - NLP, before and after spaCy [site].
  • TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more [site].
  • TextBox - a text generation library with pre-trained language models.
  • TextFeatureSelection - library for feature selection for text features.
  • texthero - Text preprocessing, representation and visualization from zero to hero [site].
  • textkit - Command line tool for manipulating and analyzing text [site].
  • textnets - Text analysis with networks [site].
  • textplot - maps of texts with kernel density estimation and force-directed networks.
  • textstat - python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
  • text2text - Crosslingual NLP/G toolkit.
  • tomotopy - Python package of Tomoto, the Topic Modeling Tool [site].
  • topic modelling tools - Topic Modelling with Latent Dirichlet Allocation using Gibbs sampling.
  • torchtext - Data loaders and abstractions for text and NLP [site].
  • trankit - a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing.
  • transformers - Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX [site].
  • txtai - Build AI-powered semantic search applications [site].
  • verbecc - Complete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian.
  • wordfreq - Access a database of word frequencies, in various natural languages.
  • wordseer - text analysis tool, written in Flask [site].
  • wordtree - A Python library for generating word tree diagrams.

R

  • BTM - Biterm Topic Models for Short Text [cran].
  • cleanNLP - Package providing annotators and a normalized data model for natural language processing [cran].
  • CRAN Task View: Natural Language Processing.
  • corporaexplorer - An R package for dynamic exploration of text collections [cran], [site].
  • dfrtopics - package for exploring topic models of text.
  • koRpus - An R Package for Text Analysis [cran].
  • hunspell - High-Performance Stemmer, Tokenizer, and Spell Checker for R [cran], [site].
  • languageR - Analyzing Linguistic Data.
  • lda - Collapsed Gibbs Sampling Methods for Topic Models.
  • ldatuning - LDA models parameters tuning [cran].
  • lsa - Latent Semantic Analysis.
  • NLP - Basic classes and methods for Natural Language Processing.
  • openNLP - An interface to the Apache OpenNLP tools.
  • pattern.nlp - R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian.
  • quanteda - package for the Quantitative Analysis of Textual Data [cran], [site].
  • RKEA - interface to KEA (Keyphrase Extraction Algorithm).
  • r-corpus - Text corpus analysis in R [cran].
  • RMallet - An R Wrapper for the Java Mallet Topic Modeling Toolkit [cran].
  • sentencepiece - R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece [cran].
  • SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library.
  • spacyr - R wrapper to spaCy NLP [cran], [site].
  • stm - Estimation of the Structural Topic Model [cran], [site].
  • stopwords - Multilingual Stopword Lists in R [cran], [site].
  • stringi - Fast and portable character string processing in R (with the Unicode ICU) [cran], [site].
  • stringr - A fresh approach to string manipulation in R [cran], [site].
  • tau - Text Analysis Utilities.
  • Text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R [cran], [site].
  • textnets - R package to perform automated text analysis using network techniques.
  • textplot - unctionalities to easily visualise complex relations in texts [cran].
  • textplot - Plotting for text data.
  • textreuse - Detect text reuse and document similarity [cran], [site].
  • tidytext - Text mining using tidy tools [cran], [site].
  • tm - A framework for text mining applications within R.
  • tokenizers - Fast, Consistent Tokenization of Natural Language Text [cran], [site].
  • topicdoc - Topic-Specific Diagnostics for LDA and CTM Topic Models [cran], [site].
  • topicmodels - Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM).
  • udpipe - package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit [cran], [site].
  • wordcloud - Functionality to create pretty word clouds, visualize differences and similarity between documents, and avoid over-plotting in scatter plots with text.
  • wordnet - WordNet Interface.
  • wordVectors - package for building and exploring word embedding models.
  • zipfR - Statistical Models for Word Frequency Distributions [site].

Julia

  • CorpusLoaders - A variety of loaders for various NLP corpora.
  • Embeddings - Functions and data dependencies for loading various word embeddings (Word2Vec, FastText, GLoVE).
  • Languages - A package for working with human languages.
  • Snowball - Snowball stemming algorithms.
  • StringAnalysis - Hard-Forked from JuliaText/TextAnalysis.jl.
  • TextAnalysis - Julia package for text analysis.
  • TextModels - Neural Network based models for Natural Language Processing.
  • WordLists - Dictionaries without definitions.
  • WordNet - A Julia package for Princeton's WordNet.
  • WordTokenizers - High performance tokenizers for natural language processing and other related tasks.
  • Word2Vec - Julia interface to word2vec.

JavaScript

  • compromise - modest natural-language processing [site].
  • natural - general natural language facilities for node.
  • nlp.js - An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and more.
  • wink-nlp-utils - NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more [site].

Java

  • CoreNLP - Stanford CoreNLP: A Java suite of core NLP tools [site].
  • Mallet - package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text [site].
  • OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. [site].

Go


Visitor Badge