diff --git a/.gitignore b/.gitignore index f78ab294b..5d9e50463 100644 --- a/.gitignore +++ b/.gitignore @@ -28,3 +28,4 @@ tmp/ !/data/.gitkeep .vagrant/ .ruby-version +docx_files/ \ No newline at end of file diff --git a/pombola/south_africa/data/members-interests/README.md b/pombola/south_africa/data/members-interests/README.md index 198df132f..548533e6e 100644 --- a/pombola/south_africa/data/members-interests/README.md +++ b/pombola/south_africa/data/members-interests/README.md @@ -7,11 +7,11 @@ There are several files in this directory: The scraper currently scrapes `.docx` files. To prepare the file: -1. Split the `PDF` into seperate files small enough to open in Google Docs. [PDF Arranger](https://github.com/pdfarranger/pdfarranger) works well -2. Open the files in Google Docs and download each in `.docx` format -3. Store the these files in `./docx_files/` +1. Split the `PDF` into seperate files small enough to open in Google Drive. [PDF Arranger](https://github.com/pdfarranger/pdfarranger) works well +2. Open the files in Google Drive and download each in `.docx` format +3. Store the these files in `./pombola/south_africa/data/members-interests/scraper/docx_files/` -Create an environment and install dependencies using +Create an environment and install dependencies in the `./pombola/south_africa/data/members-interests/scraper` directory: ``` virtualenv venv source venv/bin/activate diff --git a/pombola/south_africa/data/members-interests/scraper/.python-version b/pombola/south_africa/data/members-interests/scraper/.python-version new file mode 100644 index 000000000..56d91d353 --- /dev/null +++ b/pombola/south_africa/data/members-interests/scraper/.python-version @@ -0,0 +1 @@ +3.10.12