-
Notifications
You must be signed in to change notification settings - Fork 173
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added deploy with modal. * A few minor fixes * updated links as per comment * Updated as per the comments. * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md * Updated * Updated as per comments * Updated * minor fix for relative link * Incorporated comments and new script provided. * Added the snippets * Updated * Updated * updated poetry.lock * Updated "poetry.lock" * Added "__init__.py" * Updated snippets.py * Updated path in MAKEFILE * Added __init__.py in walkthroughs * Adjusted for black * Modified mypy.ini added a pattern module_name_pattern = '[a-zA-Z0-9_\-]+' * updated * renamed deploy-a-pipeline with deploy_a_pipeline * Updated for errors in linting * small changes * bring back deploy-a-pipeline * bring back deploy-a-pipeline in sidebar * fix path to snippet * update lock file * fix path to snippet in tags * fix Duplicate module named "snippets" * rename snippets to code, refactor article, fix mypy errors * fix black errors * rename code to deploy_snippets * add pytest testing for modal function * move example article to the bottom * update lock file --------- Co-authored-by: Anton Burnashev <[email protected]> Co-authored-by: Alena <[email protected]>
- Loading branch information
1 parent
0c6fd65
commit f5a64be
Showing
7 changed files
with
562 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
113 changes: 113 additions & 0 deletions
113
docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
title: Deploy with Modal | ||
description: How to deploy a pipeline with Modal | ||
keywords: [how to, deploy a pipeline, Modal] | ||
canonical: https://modal.com/blog/analytics-stack | ||
--- | ||
|
||
# Deploy with Modal | ||
|
||
## Introduction to Modal | ||
|
||
[Modal](https://modal.com/) is a serverless platform designed for developers. It allows you to run and deploy code in the cloud without managing infrastructure. | ||
|
||
With Modal, you can perform tasks like running generative models, large-scale batch jobs, and job queues, all while easily scaling compute resources. | ||
|
||
### Modal features | ||
|
||
- Serverless Compute: No infrastructure management; scales automatically from zero to thousands of CPUs/GPUs. | ||
- Cloud Functions: Run Python code in the cloud instantly and scale horizontally. | ||
- GPU/CPU Scaling: Easily attach GPUs for heavy tasks like AI model training with a single line of code. | ||
- Web Endpoints: Expose any function as an HTTPS API endpoint quickly. | ||
- Scheduled Jobs: Convert Python functions into scheduled tasks effortlessly. | ||
|
||
To know more, please refer to [Modals's documentation.](https://modal.com/docs) | ||
|
||
|
||
## How to run dlt on Modal | ||
|
||
Here’s a dlt project setup to copy data from public MySQL database into DuckDB as a destination: | ||
|
||
### Step 1: Initialize source | ||
Run the `dlt init` CLI command to initialize the SQL database source and set up the `sql_database_pipeline.py` template. | ||
```sh | ||
dlt init sql_database duckdb | ||
``` | ||
|
||
### Step 2: Define Modal Image | ||
Open the file and define the Modal Image you want to run `dlt` in: | ||
<!--@@@DLT_SNIPPET ./deploy_snippets/deploy-with-modal-snippets.py::modal_image--> | ||
|
||
### Step 3: Define Modal Function | ||
A Modal Function is a containerized environment that runs tasks. | ||
It can be scheduled (e.g., daily or on a Cron schedule), request more CPU/memory, and scale across | ||
multiple containers. | ||
|
||
Here’s how to include your SQL pipeline in the Modal Function: | ||
|
||
<!--@@@DLT_SNIPPET ./deploy_snippets/deploy-with-modal-snippets.py::modal_function--> | ||
|
||
### Step 4: Set up credentials | ||
You can securely store your credentials using Modal secrets. When you reference secrets within a Modal script, | ||
the defined secret is automatically set as an environment variable. dlt natively supports environment variables, | ||
enabling seamless integration of your credentials. For example, to declare a connection string, you can define it as follows: | ||
```text | ||
SOURCES__SQL_DATABASE__CREDENTIALS=mysql+pymysql://[email protected]:4497/Rfam | ||
``` | ||
In the script above, the credentials specified are automatically utilized by dlt. | ||
For more details, please refer to the [documentation.](../../general-usage/credentials/setup#environment-variables) | ||
|
||
### Step 5: Run pipeline | ||
Execute the pipeline once. | ||
To run your pipeline a single time, use the following command: | ||
```sh | ||
modal run sql_pipeline.py | ||
``` | ||
|
||
### Step 6: Deploy | ||
If you want to deploy your pipeline on Modal for continuous execution or scheduling, use this command: | ||
```sh | ||
modal deploy sql_pipeline.py | ||
``` | ||
|
||
## Advanced configuration | ||
### Modal Proxy | ||
|
||
If your database is in a private VPN, you can use [Modal Proxy](https://modal.com/docs/reference/modal.Proxy) as a bastion server (available for Enterprise customers). | ||
To connect to a production read replica, attach the proxy to the function definition and change the hostname to localhost: | ||
```py | ||
@app.function( | ||
secrets=[ | ||
modal.Secret.from_name("postgres-read-replica-prod"), | ||
], | ||
schedule=modal.Cron("24 6 * * *"), | ||
proxy=modal.Proxy.from_name("prod-postgres-proxy", environment_name="main"), | ||
timeout=3000, | ||
) | ||
def task_pipeline(dev: bool = False) -> None: | ||
pg_url = f'postgresql://{os.environ["PGUSER"]}:{os.environ["PGPASSWORD"]}@localhost:{os.environ["PGPORT"]}/{os.environ["PGDATABASE"]}' | ||
``` | ||
|
||
### Capturing deletes | ||
To capture updates or deleted rows from your Postgres database, consider using dlt's [Postgres CDC replication feature](../../dlt-ecosystem/verified-sources/pg_replication), which is | ||
useful for tracking changes and deletions in the data. | ||
|
||
### Sync Multiple Tables in Parallel | ||
To sync multiple tables in parallel, map each table copy job to a separate container using [Modal.starmap](https://modal.com/docs/reference/modal.Function#starmap): | ||
|
||
```py | ||
@app.function(timeout=3000, schedule=modal.Cron("29 11 * * *")) | ||
def main(dev: bool = False): | ||
tables = [ | ||
("task", "enqueued_at", dev), | ||
("worker", "launched_at", dev), | ||
... | ||
] | ||
list(load_table_from_database.starmap(tables)) | ||
``` | ||
|
||
## More examples | ||
|
||
For a practical, real-world example, check out the article ["Building a Cost-Effective Analytics Stack with Modal, dlt, and dbt"](https://modal.com/blog/analytics-stack). | ||
|
||
This article illustrates how to automate a workflow for loading data from Postgres into Snowflake using dlt, providing valuable insights into building an efficient analytics pipeline. |
Empty file.
68 changes: 68 additions & 0 deletions
68
...website/docs/walkthroughs/deploy-a-pipeline/deploy_snippets/deploy-with-modal-snippets.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
import os | ||
|
||
import modal | ||
from tests.pipeline.utils import assert_load_info | ||
|
||
# @@@DLT_SNIPPET_START modal_image | ||
# Define the Modal Image | ||
image = modal.Image.debian_slim().pip_install( | ||
"dlt>=1.1.0", | ||
"dlt[duckdb]", # destination | ||
"dlt[sql_database]", # source (MySQL) | ||
"pymysql", # database driver for MySQL source | ||
) | ||
|
||
app = modal.App("example-dlt", image=image) | ||
|
||
# Modal Volume used to store the duckdb database file | ||
vol = modal.Volume.from_name("duckdb-vol", create_if_missing=True) | ||
# @@@DLT_SNIPPET_END modal_image | ||
|
||
|
||
# @@@DLT_SNIPPET_START modal_function | ||
@app.function( | ||
volumes={"/data/": vol}, | ||
schedule=modal.Period(days=1), | ||
secrets=[modal.Secret.from_name("sql-secret")], | ||
) | ||
def load_tables() -> None: | ||
import dlt | ||
from dlt.sources.sql_database import sql_database | ||
|
||
# Define the source database credentials; in production, you would save this as a Modal Secret which can be referenced here as an environment variable | ||
os.environ["SOURCES__SQL_DATABASE__CREDENTIALS"] = ( | ||
"mysql+pymysql://[email protected]:4497/Rfam" | ||
) | ||
# Load tables "family" and "genome" | ||
source = sql_database().with_resources("family", "genome") | ||
|
||
# Create dlt pipeline object | ||
pipeline = dlt.pipeline( | ||
pipeline_name="sql_to_duckdb_pipeline", | ||
destination=dlt.destinations.duckdb( | ||
"/data/rfam.duckdb" | ||
), # write the duckdb database file to this file location, which will get mounted to the Modal Volume | ||
dataset_name="sql_to_duckdb_pipeline_data", | ||
progress="log", # output progress of the pipeline | ||
) | ||
|
||
# Run the pipeline | ||
load_info = pipeline.run(source) | ||
|
||
# Print run statistics | ||
print(load_info) | ||
# @@@DLT_SNIPPET_END modal_function | ||
|
||
assert_load_info(load_info) | ||
|
||
|
||
def test_modal_snippet() -> None: | ||
import pytest | ||
from modal.exception import ExecutionError | ||
|
||
# Any additional logic or calling the function | ||
with pytest.raises(ExecutionError) as excinfo: | ||
load_tables.remote() | ||
# >> modal.exception.ExecutionError: | ||
# >> Function has not been hydrated with the metadata it needs to run on Modal, because the App it is defined on is not running. | ||
assert "hydrated" in str(excinfo.value) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.