Malawi 🇲🇼 Stock Exchange 📈 Scraper

A scraping tool for MSE Daily Reports, which are uploaded on their site. The tool basically goes through different daily reports (according tothe configuration) and downloads the PDFs before converting them to CSV and then eventually uploading to a Postgres Database. What you do with the data is up to you! Just make sure you have a VPN when scraping.

We use PDF Tables to convert the PDFs to CSV, but you may use your own converter, at which point some of the logic in the downloader will need to be tweaked.

Authors

Environment Variables

To run this project, you will need to add the following environment variables to your .env file

MSE_URL the absolute URL where a PDF is to be found, without the PDF Number
PDF_START_NO PDF Start Number
PDF_END_NO PDF End Number
RAW_PDF_PATH Relative Project Path where you want to save PDFs
RAW_CSV_PATH Relative Project Path where you want to save uncleaded CSVs
ERROR_FILE_PATH Relative Project Path where you want to save Errors
CLEANED_CSV_PATH Relative Project Path where you want to save cleaned CSVs
QUEUE_SIZE Pool Maximum Queue Size
WORKER_NUM Pool Number of workers
PDFTABLES_API_KEY PDF Tables API Key

Installation

After cloning this repo, make sure you change copy the example.env into an .env and replace all the values in there with sensible configurations.

You may then build and run the program with the following flags

  go build -o scraper
  
  ./scraper -mode download
  
  #wait for completion

  ./scraper -mode clean

  #wait for completion

  ./scraper -mode save
  
  #wait for completion

Any errors incurred will be both logged in the terminal as well as recorded in the error path you provide. You may handle the errors however you see fit - including manually converting and saving them.

Acknowledgements

PDF Tables

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.idea		.idea
data		data
http/handlers		http/handlers
pkg		pkg
services		services
sql		sql
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malawi 🇲🇼 Stock Exchange 📈 Scraper

Authors

Environment Variables

Installation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

liwoo/mse-scraper-2

Folders and files

Latest commit

History

Repository files navigation

Malawi 🇲🇼 Stock Exchange 📈 Scraper

Authors

Environment Variables

Installation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages