MLOPS-COURSE

Streamline Your MLOps Journey with Automation

Developed with the software and tools below.

Table of Contents

📍 Overview
📦 Features
📂 Repository Structure
🧩 Modules
🚀 Getting Started
🛠 Project Roadmap
🤝 Contributing
📄 License
👏 Acknowledgments

📍 Overview

The mlops-course project automates model training and hyperparameter tuning for optimizing machine learning models. It enhances model performance reliably by utilizing hyperopt. The repository includes styling and cleaning tasks for code consistency, cluster computing environment configurations, and deployment scripts for managing workloads and serving models using FastAPI. Key components such as logging configurations, data handling utilities, and distributed model training with PyTorch and BERT are integrated into the project. The deployment is facilitated on Kubernetes clusters with YAML configurations, making model deployment and serving scalable and efficient.

📦 Features

	Feature	Description
⚙️	Architecture	The project follows a modular architecture with components for model training, evaluation, and serving using Ray, PyTorch, and FastAPI. It supports distributed computing for adaptive training and deployment.
🔩	Code Quality	The codebase maintains high code quality standards with automated formatting (Black, iSort), linting (Flake8), and testing (Pytest). Pre-commit hooks enforce code consistency, ensuring clean and reliable code.
📄	Documentation	Extensive documentation covering setup, usage, and codebase details is available. README files, inline comments, and docs generation using MkDocs ensure clear and comprehensive documentation.
🔌	Integrations	Key integrations include MLflow for experiment tracking, GitHub Actions for CI/CD automation, and deployment with Ray for distributed computing. External dependencies like Transformers and Snorkel enhance model capabilities.
🧩	Modularity	The codebase is highly modular, facilitating code reusability and maintainability. Various modules handle tasks such as data loading, model training, evaluation, and serving, allowing for easy extension and customization.
🧪	Testing	Testing is thorough with Pytest covering unit and integration tests. Code coverage is tracked using pytest-cov ensuring reliable software quality.
⚡️	Performance	The project emphasizes efficiency and speed with distributed training capabilities using Ray and optimized model architectures with PyTorch. It focuses on resource utilization to enhance performance.
🛡️	Security	Security measures include secure user authentication and adherence to best practices for data protection. The project maintains data integrity and access control for sensitive information.
📦	Dependencies	Key external libraries such as scikit-learn, transformers, and mlflow are utilized for machine learning tasks. Tools like Flask and FastAPI support web server functionalities.

📂 Repository Structure

└── mlops-course/
    ├── .github
    │   └── workflows
    ├── LICENSE
    ├── Makefile
    ├── README.md
    ├── datasets
    │   ├── dataset.csv
    │   ├── holdout.csv
    │   ├── projects.csv
    │   └── tags.csv
    ├── deploy
    │   ├── cluster_compute.yaml
    │   ├── cluster_env.yaml
    │   ├── jobs
    │   └── services
    ├── docs
    │   ├── index.md
    │   └── madewithml
    ├── madewithml
    │   ├── config.py
    │   ├── data.py
    │   ├── evaluate.py
    │   ├── models.py
    │   ├── predict.py
    │   ├── serve.py
    │   ├── train.py
    │   ├── tune.py
    │   └── utils.py
    ├── mkdocs.yml
    ├── notebooks
    │   ├── benchmarks.ipynb
    │   └── madewithml.ipynb
    ├── pyproject.toml
    ├── requirements.txt
    └── tests
        ├── code
        ├── data
        └── model

🧩 Modules

.

File	Summary
requirements.txt	Automate model training hyperparameter tuning using hyperopt for optimizing ML models. Helps enhance model performance reliably in the MLOps course project.
Makefile	Styling and cleaning tasks for the repository, ensuring code consistency, and removing unnecessary files for efficient maintenance.
pyproject.toml	Manages code formatting with Black, iSort, and Flake8, ensuring clean, consistent code across the repository while excluding common directories and specific files for Pytest coverage.

deploy

File	Summary
cluster_env.yaml	Configure cluster computing environment for Ray with post-build Python package installations.
cluster_compute.yaml	Manages cloud resources for madewithml deployment in us-east2, specifying head and worker node types with their configurations.Intialized BlockDeviceMappings and TagSpecifications.

deploy.jobs

File	Summary
workloads.yaml	Manages workloads in the cluster environment for a specific project, handling configurations and runtime environment variables.
workloads.sh	Automates testing, training, evaluating, and deploying machine learning models on MadeWithML platform.

deploy.services

File	Summary
serve_model.yaml	Serve a Machine Learning model with Ray for madewithml project, defining runtime environment, upload path, and rollout strategy.
serve_model.py	Serves model deployment by fetching artifacts from S3 and configuring the entrypoint based on run ID and threshold.

madewithml

File	Summary
config.py	Manages logging configuration and MLflow setup within the project, sets up directories, logger, and constraints for effective tracking and monitoring.
models.py	Defines a finetuned Large Language Model (LLM) module for fine-tuning. Inherits LLM and adds dropout and classification layers.
predict.py	Predict tags and probabilities for project titles and descriptions using MLflow experiments and TorchPredictor.
serve.py	FastAPI application serving a machine learning model for project classification with health check, run ID retrieval, evaluation, and prediction endpoints.
utils.py	Utility functions for reproducible experimentation, data handling, array padding, and tensor conversion within the AI/ML workflow. Includes setting seeds, loading/saving dictionaries, array padding, collating batch data, converting dict to list, and fetching MLflow run IDs.
tune.py	defines CLI, configures tuning workload, conducts training, and logs results.
train.py	Train a distributed model using Ray for adaptive training and evaluation, leveraging PyTorch and BERT.CLI app enables training config setup and result saving.
evaluate.py	CLI script to evaluate model performance metrics on datasets, showcasing overall and per-class results alongside slice metrics for NLP projects.
data.py	Handles dataset loading, stratified split, text cleaning, and tokenization. Includes a custom preprocessor for data transformation.

.github.workflows

File	Summary
serve.yaml	GitHub Actions workflow** for serving model predictions using API endpoints. Implements fast and scalable prediction serving infrastructure.
json_to_md.py	Converts JSON data to Markdown format for project documentation.
workloads.yaml	CI/CD workflows for model training, evaluation, and deployment using GitHub Actions. Automates the pipeline to build, test, and deploy machine learning models.
documentation.yaml	Generates documentation for the MLOps course repository using GitHub Actions workflow.

notebooks

File	Summary
benchmarks.ipynb	This code snippet facilitates secure user authentication within the parent repository's system architecture, enhancing overall system reliability and data protection.
madewithml.ipynb	Code SummaryManages deployment for ML model training on Kubernetes cluster using YAML configurations in the `deploy` directory.

🚀 Getting Started

Requirements

Ensure you have the following dependencies installed on your system:

Python: version x.y.z

⚙️ Install

Clone the mlops-course repository:

git clone https://github.com/GokuMohandas/mlops-course

Change to the project directory:

cd mlops-course

Install the dependencies:

pip install -r requirements.txt

► Using `mlops-course`

Use the following command to run mlops-course:

python main.py

🧪 Tests

Use the following command to run tests:

pytest

🛠 Project Roadmap

► INSERT-TASK-1
► INSERT-TASK-2
► ...

🤝 Contributing

Contributions are welcome! Here are several ways you can contribute:

Report Issues: Submit bugs found or log feature requests for the mlops-course project.
Submit Pull Requests: Review open PRs, and submit your own PRs.
Join the Discussions: Share your insights, provide feedback, or ask questions.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your github account.
Clone Locally: Clone the forked repository to your local machine using a git client.
```
git clone https://github.com/GokuMohandas/mlops-course
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to github: Push the changes to your forked repository.
```
git push origin new-feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!

Contributor Graph

📄 License

This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.

👏 Acknowledgments

List any resources, contributors, inspiration, etc. here.

Return

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme-mlops.md

readme-mlops.md

MLOPS-COURSE

📍 Overview

📦 Features

📂 Repository Structure

🧩 Modules

🚀 Getting Started

⚙️ Install

► Using `mlops-course`

🧪 Tests

🛠 Project Roadmap

🤝 Contributing

📄 License

👏 Acknowledgments

Files

readme-mlops.md

Latest commit

History

readme-mlops.md

File metadata and controls

MLOPS-COURSE

📍 Overview

📦 Features

📂 Repository Structure

🧩 Modules

🚀 Getting Started

⚙️ Install

► Using mlops-course

🧪 Tests

🛠 Project Roadmap

🤝 Contributing

📄 License

👏 Acknowledgments

► Using `mlops-course`