Scraping documentation on Google Scholar. The key feature is to download the pdf files directly to simplify the search of documentation.
This was designed for personal purpose.
Using :
- python 3
- selenium (firefox) installation link
- langid python library
Pull this git repo and then:
git submodule update --init --recursive
Note: be sure to use the version 74 of firefox or change the geckodriver by visiting the selenium website.
./search.py <QUERY> <LIMIT>
<QUERY>
is required.
<LIMIT>
is optional.
You can ask or add features. Feel free to report errors here in the issue https://github.com/lucgerrits/google-scholar-scraper/issues.
- add argument parser
- Security
- Some improvements