Skip to content

Python utility for scraping vocabulary entries from Italy's best encyclopaedia: Treccani

License

Notifications You must be signed in to change notification settings

fcagnola/tr3ccani-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tr3ccani-scraper

license coolness

If you've you ever thought "I wonder what the definition of 'pasta' is in Italian", this CLI is for you!

example

The Treccani Encyclopaedia is a renowned and revered source of knowledge for any respectable italian🇮🇹.

Since they do not provide any APIs exposing their precious dictionary, I thought i could apply my limited knowledge of web-scraping and regular expressions to a good use.

Unfortunately I don't really have time to maintain the project, but since I plan to use it as a command-line resource I will probably make it better over time.

Anyway, if you decide to use it and have suggestions, feel free to fork, add a branch and submit a pull request: I'll make my best to review it and merge.

Usage:

The package is written in Python 3.9, but I think anything above Python 3.6 should work.

First of all, clone the repository to a local folder on your machine.

cd <your desired folder>
git clone https://github.com/fcagnola/tr3ccani-scraper.git

To scrape the web I used the fantastic requests-html package, which you'll need to install together with rich for good-looking console output:

pip3 install -r requirements.txt

Finally, to get the definition(s) of a word, type

cd <folder where you cloned the repository>
python3 scraper.py <word>

The utility will run and print on the command line one result per line. As of now I only managed to get the first "use" for each page, but I do support multiple pages (each word can have multiple different meanings -> multiple pages, and multiple uses for that meaning).

I'm working on making it available as a python package in order to be able to use it in other scripts as well as make command-line arguments for saving the output.

Disclaimer

The goal of this utility is simply researching a few words from a very reliable source. I do not own any content of treccani.it and this script is not intended for production environments. You can get in touch with me for any questions not covered by the license which is not intended to include any material from the encyclopaedia website, but only my script.

About

Python utility for scraping vocabulary entries from Italy's best encyclopaedia: Treccani

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages