Team members can access the running notes for meetings, which provide details of the project goals and decisions.
Data source: https://results2021.ref.ac.uk/ (accessed 2023-08-10)
Information on submission system data requirements: https://ref.ac.uk/guidance-and-criteria-on-submissions/guidance/submission-system-data-requirements/ (accessed 2023-08-30).
Local copy of the download page: Submission_system_data_requirements-REF2021.pdf (accessed 2023-08-10)
Streamlit data explorer hosted on Azure is available at [https://ref2021explorer.azurewebsites.net].
Follow these steps to set up the environment for this project:
-
Install Python 3.x on your system if it is not already installed.
-
Clone this project from GitHub.
-
Navigate to the project root directory in your terminal or command prompt.
-
Create a new virtual environment with
python3 -m venv venv
This will create a new virtual environment named
venv
in the current directory. -
Activate the virtual environment with:
On Windows
venv\Scripts\activate.bat
On Unix/Linux/MacOS:
source venv/bin/activate
This will activate the virtual environment and change your prompt to indicate that you are now working inside the virtual environment.
-
Install the project dependencies with:
pip install -r requirements.txt
This will install all of the required packages and their versions listed in the
requirements.txt
file.
The raw PDF format environment statements have been processed with pdftotext(1) tool from poppler-utils
Package used to convert to text is poppler-utils 22.12.0.
The conversion was done on a Debian bookworm system on a x86_64 architecture.
The script is not dockerised but can be done based on the debian:bookworm-slim
image if required.
To convert the PDFs to *.txt files run this script in the folder containing the PDFs
#!/bin/sh
for i in *.pdf; do
pdftotext -layout "$i"
done
Then the *.txt
files are then copied into data/processed/environment_statements folder.