Skip to content

Latest commit

 

History

History
311 lines (242 loc) · 21.3 KB

readme-mlops.md

File metadata and controls

311 lines (242 loc) · 21.3 KB

project-logo

MLOPS-COURSE

Streamline Your MLOps Journey with Automation

license last-commit repo-top-language repo-language-count

Developed with the software and tools below.

GNU%20Bash precommit scikitlearn Jupyter YAML Python GitHub%20Actions
pandas Pytest Ray MLflow NumPy FastAPI

Table of Contents

📍 Overview

The mlops-course project automates model training and hyperparameter tuning for optimizing machine learning models. It enhances model performance reliably by utilizing hyperopt. The repository includes styling and cleaning tasks for code consistency, cluster computing environment configurations, and deployment scripts for managing workloads and serving models using FastAPI. Key components such as logging configurations, data handling utilities, and distributed model training with PyTorch and BERT are integrated into the project. The deployment is facilitated on Kubernetes clusters with YAML configurations, making model deployment and serving scalable and efficient.


📦 Features

Feature Description
⚙️ Architecture The project follows a modular architecture with components for model training, evaluation, and serving using Ray, PyTorch, and FastAPI. It supports distributed computing for adaptive training and deployment.
🔩 Code Quality The codebase maintains high code quality standards with automated formatting (Black, iSort), linting (Flake8), and testing (Pytest). Pre-commit hooks enforce code consistency, ensuring clean and reliable code.
📄 Documentation Extensive documentation covering setup, usage, and codebase details is available. README files, inline comments, and docs generation using MkDocs ensure clear and comprehensive documentation.
🔌 Integrations Key integrations include MLflow for experiment tracking, GitHub Actions for CI/CD automation, and deployment with Ray for distributed computing. External dependencies like Transformers and Snorkel enhance model capabilities.
🧩 Modularity The codebase is highly modular, facilitating code reusability and maintainability. Various modules handle tasks such as data loading, model training, evaluation, and serving, allowing for easy extension and customization.
🧪 Testing Testing is thorough with Pytest covering unit and integration tests. Code coverage is tracked using pytest-cov ensuring reliable software quality.
⚡️ Performance The project emphasizes efficiency and speed with distributed training capabilities using Ray and optimized model architectures with PyTorch. It focuses on resource utilization to enhance performance.
🛡️ Security Security measures include secure user authentication and adherence to best practices for data protection. The project maintains data integrity and access control for sensitive information.
📦 Dependencies Key external libraries such as scikit-learn, transformers, and mlflow are utilized for machine learning tasks. Tools like Flask and FastAPI support web server functionalities.

📂 Repository Structure

└── mlops-course/
    ├── .github
    │   └── workflows
    ├── LICENSE
    ├── Makefile
    ├── README.md
    ├── datasets
    │   ├── dataset.csv
    │   ├── holdout.csv
    │   ├── projects.csv
    │   └── tags.csv
    ├── deploy
    │   ├── cluster_compute.yaml
    │   ├── cluster_env.yaml
    │   ├── jobs
    │   └── services
    ├── docs
    │   ├── index.md
    │   └── madewithml
    ├── madewithml
    │   ├── config.py
    │   ├── data.py
    │   ├── evaluate.py
    │   ├── models.py
    │   ├── predict.py
    │   ├── serve.py
    │   ├── train.py
    │   ├── tune.py
    │   └── utils.py
    ├── mkdocs.yml
    ├── notebooks
    │   ├── benchmarks.ipynb
    │   └── madewithml.ipynb
    ├── pyproject.toml
    ├── requirements.txt
    └── tests
        ├── code
        ├── data
        └── model

🧩 Modules

.
File Summary
requirements.txt Automate model training hyperparameter tuning using hyperopt for optimizing ML models. Helps enhance model performance reliably in the MLOps course project.
Makefile Styling and cleaning tasks for the repository, ensuring code consistency, and removing unnecessary files for efficient maintenance.
pyproject.toml Manages code formatting with Black, iSort, and Flake8, ensuring clean, consistent code across the repository while excluding common directories and specific files for Pytest coverage.
deploy
File Summary
cluster_env.yaml Configure cluster computing environment for Ray with post-build Python package installations.
cluster_compute.yaml Manages cloud resources for madewithml deployment in us-east2, specifying head and worker node types with their configurations.Intialized BlockDeviceMappings and TagSpecifications.
deploy.jobs
File Summary
workloads.yaml Manages workloads in the cluster environment for a specific project, handling configurations and runtime environment variables.
workloads.sh Automates testing, training, evaluating, and deploying machine learning models on MadeWithML platform.
deploy.services
File Summary
serve_model.yaml Serve a Machine Learning model with Ray for madewithml project, defining runtime environment, upload path, and rollout strategy.
serve_model.py Serves model deployment by fetching artifacts from S3 and configuring the entrypoint based on run ID and threshold.
madewithml
File Summary
config.py Manages logging configuration and MLflow setup within the project, sets up directories, logger, and constraints for effective tracking and monitoring.
models.py Defines a finetuned Large Language Model (LLM) module for fine-tuning. Inherits LLM and adds dropout and classification layers.
predict.py Predict tags and probabilities for project titles and descriptions using MLflow experiments and TorchPredictor.
serve.py FastAPI application serving a machine learning model for project classification with health check, run ID retrieval, evaluation, and prediction endpoints.
utils.py Utility functions for reproducible experimentation, data handling, array padding, and tensor conversion within the AI/ML workflow. Includes setting seeds, loading/saving dictionaries, array padding, collating batch data, converting dict to list, and fetching MLflow run IDs.
tune.py defines CLI, configures tuning workload, conducts training, and logs results.
train.py Train a distributed model using Ray for adaptive training and evaluation, leveraging PyTorch and BERT.CLI app enables training config setup and result saving.
evaluate.py CLI script to evaluate model performance metrics on datasets, showcasing overall and per-class results alongside slice metrics for NLP projects.
data.py Handles dataset loading, stratified split, text cleaning, and tokenization. Includes a custom preprocessor for data transformation.
.github.workflows
File Summary
serve.yaml GitHub Actions workflow** for serving model predictions using API endpoints. Implements fast and scalable prediction serving infrastructure.
json_to_md.py Converts JSON data to Markdown format for project documentation.
workloads.yaml CI/CD workflows for model training, evaluation, and deployment using GitHub Actions. Automates the pipeline to build, test, and deploy machine learning models.
documentation.yaml Generates documentation for the MLOps course repository using GitHub Actions workflow.
notebooks
File Summary
benchmarks.ipynb This code snippet facilitates secure user authentication within the parent repository's system architecture, enhancing overall system reliability and data protection.
madewithml.ipynb Code SummaryManages deployment for ML model training on Kubernetes cluster using YAML configurations in the deploy directory.

🚀 Getting Started

Requirements

Ensure you have the following dependencies installed on your system:

  • Python: version x.y.z

⚙️ Install

  1. Clone the mlops-course repository:
git clone https://github.com/GokuMohandas/mlops-course
  1. Change to the project directory:
cd mlops-course
  1. Install the dependencies:
pip install -r requirements.txt

► Using mlops-course

Use the following command to run mlops-course:

python main.py

🧪 Tests

Use the following command to run tests:

pytest

🛠 Project Roadmap

  • ► INSERT-TASK-1
  • ► INSERT-TASK-2
  • ► ...

🤝 Contributing

Contributions are welcome! Here are several ways you can contribute:

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your github account.
  2. Clone Locally: Clone the forked repository to your local machine using a git client.
    git clone https://github.com/GokuMohandas/mlops-course
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to github: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
  8. Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
Contributor Graph


📄 License

This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.


👏 Acknowledgments

  • List any resources, contributors, inspiration, etc. here.

Return