Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

Airflow pipeline for ScienceBeam related training and evaluation

License

Notifications You must be signed in to change notification settings

elifesciences/sciencebeam-airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScienceBeam Airflow

⚠️ Under new stewardship

eLife have handed over stewardship of ScienceBeam to The Coko Foundation. You can now find the updated code repository at https://gitlab.coko.foundation/sciencebeam/sciencebeam-airflow and continue the conversation on Coko's Mattermost chat server: https://mattermost.coko.foundation/

For more information on why we're doing this read our latest update on our new technology direction: https://elifesciences.org/inside-elife/daf1b699/elife-latest-announcing-a-new-technology-direction

Overview

Airflow pipeline for ScienceBeam related training and evaluation.

Apache Airflow:

is a platform to programmatically author, schedule, and monitor workflows. ... Airflow is not a data streaming solution.

We are using the official Airflow Image.

Prerequisites

gcloud setup

gcloud auth application-default login

Configuration

Airflow, using the official Airflow Image, is mainly configured in the following way:

  • Environment variables interpreted by Airflow, e.g. AIRFLOW__CORE__SQL_ALCHEMY_CONN
  • Default configuration by the Airflow project in default_airflow.cfg

(Since we are using Docker Compose, environment variables would be passed in via docker-compose.yml)

Deployment

The Dockerfile is used to build the image that is getting deployed within the cluster.

Development

The Docker Compose configuration is only used for development purpose (in the future it could in part be used to build the image).

For development, it is making the local gcloud config available to the Airflow container.

Start

Build and start the image.

make start

Airflow Admin will be available on port 8080 and the Celery Flower will be on port 5555.

Test

Build and run tests.

make test

Stop

make stop

Clean

make clean