Skip to content

MaastrichtU-IDS/dsri-demo

Repository files navigation

Demo for the Data Science Research Infrastructure

Repository to demonstrate how to get started with the Data Science Research Infrastructure (DSRI) using Python.

Start a JupyterLab workspace from the template in the DSRI web UI catalog.

Optionally: provide this repository URL when asked for a git repository to clone in the JupyterLab workspace, required packages will be automatically installed at the start of JupyterLab, from the requirements.txt and packages.txt files presents at the root of the repository. Otherwise you can simply clone the repository after deploying the workspace.

Clone the repository:

git clone https://github.com/MaastrichtU-IDS/dsri-demo.git
cd dsri-demo

MySQL demo notebook

  1. Start the MySQL database from the template in the DSRI web UI catalog
  2. Access the JupyterLab web UI, and open the terminal
  3. Run the mysql.ipynb notebook to load and query data in the MySQL database

PostgreSQL demo notebook

  1. Start the PostgreSQL database from the template in the DSRI web UI catalog
  2. Access the JupyterLab web UI, and open the terminal
  3. Run the postgresql.ipynb notebook to load and query data in the PostgreSQL database

Long running script demo

Long running tasks cannot be run via the JupyterLab web UI, as the connection might be lost and JupyterLab is designed to visualize data, not run and manage long running tasks.

Multiple options are available, the easiest being to just run your jobs as a python script without the need to use any library (just copy/paste the codeblocks of the notebooks in a .py file)

Start the script via the ZSH terminal in detached mode (it will continue to run even if you close the terminal and the session):

python script_get_data.py &

The print() will be shown in the terminal session where you started the script, but it will not be stopped if you leave the window or close the terminal session.

You can also send the logs to a file:

python script_get_data.py > script.log &
You can also start the script with the Bash terminal in detached mode
bash
nohup python script_get_data.py &

You can see the output generated by your python script in the file nohup.out in the folder where you started the script (instead of outputting directly to the terminal)

Use this command in the terminal to show all processes running and see if your script is still running:

ps aux

You can also filter the output to see only your script

ps aux | grep script_get_data

Workflow demo

To do.

Other demo repositories

MongoDB to be tested

  1. git clone https://github.com/pedrohserrano/twitter-covid-scam
  2. Download json_covid.zip (to be hosted somewhere, currently on a USB stick in Pedro's moving boxes)
  3. Run Dump2Mongo.ipynb
  4. Run Analysis.ipynb

About

Quick demo to use the DSRI for Data Science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published