Skip to content

linuxphile/nlp_lesson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Description

The script performs the following tasks:

Tokenization:

It breaks down the text into words, phrases, symbols, or other meaningful elements called tokens. The output is a list of tokens.

Stop Words Removal:

It removes common words (like 'is', 'the', 'and') that do not carry much meaningful information.

Punctuation Removal:

It removes punctuation from the text.

Frequency Count:

It counts the frequency of each word in the text and prints the 5 most common words.

Lemmatization:

It reduces the words to their base or root form (for example, 'running' to 'run').

Part-of-Speech (POS) Tagging:

It labels each word in the text as corresponding to a particular part of speech (like noun, verb, adjective, etc.).

Named Entity Recognition (NER):

It identifies and classifies named entities in the text into predefined categories like person names, organizations, locations, etc.

Dependency Parsing Visualization:

It visualizes the grammatical structure of sentences, depicting how words relate to each other.

Requirements

  • Python
  • SpaCy
  • en_core_web_sm (SpaCy model)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published