Scripts to scrape quiz content off Edx, remove answers & consolidate to a single html for print-out/practice/review purposes.
This project requires:
- Python 3, and
- a Unix based operation system (e.g. MacOS or Linux). Sorry Windows users, you deserve a better operating system.
Get the source code onto your computer:
if you have git:
git clone [email protected]:weilu/towardmit.git
Otherwise click the green button "Clone or download" on this page, click "Download ZIP". Then unzip the file you downloaded.
Then open the Terminal app (MacOS) or the equivalent of a console thing on other unix systems and executing the following commands:
cd towardmit
pip install -r requirements.txt
cp scrape_sample.sh scrape.sh
In the scrape.sh file, the [your request headers] needs to be replaced by request headers obtained from your browser. The request headers will be a series of '-H' options, which includes your edX login details.
These headers can be found by doing the following.
(instructions are with the Chrome or Chromium web-browser, and tested using Linux & MacOS)
- Open your edX dashboard, logging in as necessary
- Open 'Developer Tools', which is a sub-menu item from 'More tools' in the menu
- Choose the 'Network' tab in the Developer pane
- Reload the dashboard web-page
- Using the first entry in the 'Network' tab, named 'dashboard', open up the context (right-click) menu and choose 'Copy as cURL'
- Using a text editor, copy all the '-H' options (which will follow the "curl 'https://courses.edx.org/dashboard'"), but DO NOT copy any option which will compress the output (e.g. "-H 'Accept-Encoding: gzip, deflate, br'" and "--compressed". There will be many lines of '-H' options!
- Copy the required '-H' options into your scrape.sh file, replacing '[your request headers]'
python scrape.py
The generated quiz html files can be found in your out
directory