Skip to content

ANA-POTJE/SentimentalistsApp-Backend

 
 

Repository files navigation

SentimentalistsApp-Backend

The backend service for The Sentimentalists article analysis service.
The source code was developed in PYTHON.
The APP was then built and deployed on AWS Lambda.

Image of Backend

Folder Structure:

The SENTIMENTALISTSAPP-BACKEND is divided into the following folders:

infra \ prod

Contains the Terraform files:

  • main.tf
  • variables.tf

src

Contains the source code of our Python modules.

  • downloadPunkt.py
    Downloads NLTK PUNKT, which is used by TextBlob library, in order to reduce the size of the package passed in the automation to AWS Lambda.

  • getBiasScore.py
    Calculates the TRUST SCORE, based on the credibility, polarity, subjectivity values.

  • getCredibilityScore.py
    Calls the API Gate Source Credibility passing the URL. Returns the URL Credibility Score, Category (Left Center, Fake News, ... and the Source which rated the website (Media Bias / Fact Check, etc)..py.

  • getSecret.py
    Calls AWS Secret Manager and returns the requested secret as a dict of key/value pairs.

  • getText.py
    Calls the Python library "Newspaper", which retrieves the text (article) from an URL. Returns the article TEXT, HEADER, SUMMARY, KEYWORDS and TOP_IMAGE of the news article.

  • lambda_function.py
    Main module of our backend app. Firstly it validates the URL, then it calls the following Python modules:

    1. getCredibilityScore.py
    2. sentimentAnalysis.py
    3. getBiasScore.py
    4. spacyMatcher.py

    Each of these modules returns results that will populate our JSON file, which will be sent to the frontend via AWS Lambda.

  • sentimentAnalysis.py
    Reads an URL, then calls the function "getText" to convert the HTML text into an unformatted text. Then it calls the Python
    Library TextBlob, which analyses the "sentiment" of the text. It finally returns the polarity and subjectivity of the whole text.

  • spacyMatcher.py
    Calls the Python Library Spacy with a TEXT to be analysed and a specific TAG (or '' for ALL TAGS). Lambda_Function.py calls spacyMatcher with TAG = '', so ALL TAGS are returned. Please find below the list of tags currently used. The output of this function is a list with dictionary pairs: {'type' : tag, 'topic' : obj}.
    PERSON - People, including fictional.
    ORG - Companies, agencies, institutions, etc.
    GPE - Countries, cities, states.
    PERCENT - Percentage, including ”%“.
    LANGUAGE - Any named language.
    DATE - Absolute or relative dates or periods.
    TIME - Times smaller than a day.
    LOC - Non-GPE locations, mountain ranges, bodies of water.
    NORP - Nationalities or religious or political groups.
    EVENT - Named hurricanes, battles, wars, sports events, etc.
    WORK_OF_ART - Titles of books, songs, etc.
    MONEY - Monetary values, including unit.
    QUANTITY - Measurements, as of weight or distance.
    ORDINAL - “first”, “second”, etc.
    CARDINAL - Numerals that do not fall under another type (not ordinal, quantity ..)
    PS: Our APP Frontend is currently Using the following spaCy tags: PERSON, ORG, GPE, EVENT and WORK_OF_ART.

The following files are used in the automation, installing objects, compressing / deleting them or pointing to the Python libraries that must be installed:

  • build-requirements.txt
  • build.sh
  • package.sh
  • requirements.txt

tests

Contains the Python modules used to run the tests (PYTEST library).
We are currently running 50 tests, as shown below:

  • test_checkCredibilityScore.py (2 tests)
  • test_getBiasScore.py (2 tests)
  • test_getSecret.py (2 tests)
  • test_getText.py (6 tests)
  • test_lambda_handler.py (15 tests)
  • test_sentimentAnalysis.py (7 tests)
  • test_spacyMatcher.py (16 tests)

INSTALL.md (file)

The file INSTALL.md contains commands used to create the local anaconda environment, as well as settings used to enable the PYTEST execution and important environment variables locally set.

SCOPE.md (file)

The file SCOPE.md has a list of the libraries and APIs used in the backend code. It also has a list of ideas that can be implemented in future MVPs.

About

Backend article analysis service for The Sentimentalists

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.3%
  • HCL 4.4%
  • Shell 2.3%