GitHub - aai-institute/beyond-jupyter: Software design principles for machine learning applications

The Beyond Jupyter project is a collection of resources for software design, with a specific focus on machine learning applications. The software being developed in machine learning contexts often remains at fairly low levels of abstraction and fails to satisfy well-established standards in software design and software engineering. One could argue that development environments such as Jupyter even actively encourage unstructured design; and we thus deem it necessary to abandon the respective software development patterns and to metaphorically go "beyond Jupyter".

The goal of the course material is for practitioners to

understand how a principled software design approach supports every aspect of a machine learning project, accelerating both development & experimentation.

It is a common misconception that good design slows down development, while, in fact, the opposite is true. We showcase the limitations of (unstructured) procedural code and explain how principled design approaches can drastically increase development speed while simultaneously improving the quality of the code along multiple dimensions. We advocate object-oriented design principles, which naturally encourage modularity and map well to real-world concepts in the application domain, be they concrete or abstract. Our overarching goal is to foster

maintainability
efficiency
generality, and
reproducibility.

Preliminaries

The lecture content contains example code which requires data to run. It is thus required to set up a Python virtual environment, configure a project within your IDE, and to download the required datasets.

Python Environment

Use conda to create an environment based on environment.yml.

conda env create -f environment.yml

This will create a conda environment named pop.

Configure Your IDE's Runtime Environment

Open this repository as a project in your IDE and configure it to use the pop environment created in the previous step.

Downloading the Data

You can download the data in two ways:

Manually download it from the Kaggle website. Place the CSV file spotify_data.csv in the data folder (in the root of this repository).
Alternatively, use the script load_data.py to automatically download the raw data CSV file to the subfolder data on the top level of the repository. Note that a Kaggle API key, which must be configured in kaggle.json, is required for this (see instructions).

Course Modules

Object-Oriented Programming: Essentials

This module explains the core principles of object-oriented programming (OOP), which lay the foundation for subsequent modules. If your familiarity with OOP concepts and design principles is low, or if its benefits are not yet clear to you, we highly recommend starting with this module.

At a structural level, OOP adds complexity, yet this complexity can be mitigated by using advanced development tools. We thus also include a section on the interplay between OOP and integrated development environments (IDEs) in this section.
Guiding Principles

This module puts forth our set of guiding principles for software development in machine learning applications. These principles can critically inform design decisions during development.
Spotify Song Popularity Prediction: A Refactoring Journey

This module addresses the full journey from a notebook implemented in Jupyter to a highly structured solution that is vastly more flexible, easy to maintain and that strongly facilitates experimentation as well as deployment for production use. We transform the implementation step by step, clearly explaining the benefits achieved and naming the relevant principles being implemented along the way.
Anti-Patterns

While the rest of the course material focusses on demonstrating positive design patterns, this module collects a number of common anti-patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
.devcontainer		.devcontainer
anti-patterns		anti-patterns
data		data
oop-essentials		oop-essentials
refactoring-journey		refactoring-journey
resources		resources
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Guiding-Principles.md		Guiding-Principles.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
load_data.py		load_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preliminaries

Course Modules

About

Releases 2

Packages

Contributors 4

Languages

License

aai-institute/beyond-jupyter

Folders and files

Latest commit

History

Repository files navigation

Preliminaries

Course Modules

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages