Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-90] Introduce CI job to test schemachange updates against a clone #78

Open
wants to merge 79 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
ed485ba
Initial commit. A simple clone of the DB
jaymedina Oct 23, 2024
b388915
Install snowsql
jaymedina Oct 23, 2024
e3c575a
add pw
jaymedina Oct 23, 2024
ae8d75d
add debug step
jaymedina Oct 23, 2024
4371e0b
.
jaymedina Oct 23, 2024
c9989e5
debugging
jaymedina Oct 23, 2024
1b136f0
syntax
jaymedina Oct 23, 2024
680ecaa
directly pass vars
jaymedina Oct 23, 2024
7d946ed
new .sql file to run the command isntead
jaymedina Oct 23, 2024
a342f85
global vars
jaymedina Oct 23, 2024
244216b
dynamically change the name of the clone DB
jaymedina Oct 23, 2024
4f35760
syntax
jaymedina Oct 23, 2024
ab2a048
syntax
jaymedina Oct 23, 2024
be2e82f
using IDENTIFIER for db object
jaymedina Oct 23, 2024
fb4a208
.
jaymedina Oct 23, 2024
913ef4c
.
jaymedina Oct 23, 2024
ded7408
set var sub true
jaymedina Oct 23, 2024
b25b10d
add identifier
jaymedina Oct 23, 2024
2bdc6ef
.
jaymedina Oct 23, 2024
2090fad
.
jaymedina Oct 23, 2024
b0ebf0c
typo
jaymedina Oct 23, 2024
d91d5ab
.
jaymedina Oct 23, 2024
6a7b631
.
jaymedina Oct 23, 2024
d8c8927
?
jaymedina Oct 23, 2024
38da909
.
jaymedina Oct 23, 2024
c75abba
go back to running command in ci step itself
jaymedina Oct 23, 2024
1a5160f
role is data_engineer
jaymedina Oct 23, 2024
1edbb83
branch sha
jaymedina Oct 23, 2024
1bc7f04
Run schemachagne on the clone
jaymedina Oct 24, 2024
a94ea31
.
jaymedina Oct 24, 2024
cee2c5e
.
jaymedina Oct 24, 2024
df3a91f
remove clone_db.sql
jaymedina Oct 24, 2024
46c8390
label name changed
jaymedina Oct 25, 2024
ca73fb6
change job name. change clone name convention. fix typos. new delete_…
jaymedina Oct 28, 2024
6fe39ca
address syntax issues in clone name
jaymedina Oct 28, 2024
88c7b00
add a status message
jaymedina Oct 28, 2024
939c9f7
missing env var
jaymedina Oct 28, 2024
4116019
vars
jaymedina Oct 28, 2024
bac91b8
sysadmin to run schemachange
jaymedina Oct 28, 2024
1ffafa9
syntax fix
jaymedina Oct 28, 2024
c44ae9e
refactor and introduce drop_clone job
jaymedina Oct 28, 2024
a3eade5
no refactoring
jaymedina Oct 28, 2024
d76999f
drop_clone will only run after PR merging
jaymedina Oct 28, 2024
89f06b4
new contributing.md
jaymedina Oct 28, 2024
93253a2
updating contributing
jaymedina Oct 29, 2024
6e0d537
move tip
jaymedina Oct 29, 2024
9b187c1
.
jaymedina Oct 29, 2024
3bbd05c
.
jaymedina Oct 29, 2024
4f042c0
.
jaymedina Oct 29, 2024
d2a2452
.
jaymedina Oct 29, 2024
ba646fa
remove redundancies from drop_clone job
jaymedina Oct 29, 2024
6e24035
testing
jaymedina Oct 29, 2024
fc3a85d
testing done
jaymedina Oct 29, 2024
cb218a5
final comment on time travel
jaymedina Oct 29, 2024
06996ab
Update CONTRIBUTING.md
jaymedina Oct 29, 2024
ea07518
Update CONTRIBUTING.md
jaymedina Oct 29, 2024
9d4aaaa
fix typo. small formatting update.
jaymedina Oct 29, 2024
03dfc5a
only run create_clone on open prs. run drop_clone on closed and merge…
jaymedina Oct 30, 2024
ab9297a
improve clarity in contributing.md
jaymedina Oct 30, 2024
8498feb
rephrasing
jaymedina Oct 31, 2024
ed645f3
use snowflake cli
jaymedina Nov 4, 2024
ced4d2c
snowflake connection
jaymedina Nov 4, 2024
3a90e92
.
jaymedina Nov 4, 2024
a85a5ca
lol
jaymedina Nov 4, 2024
8e78bd8
.
jaymedina Nov 4, 2024
294244f
.
jaymedina Nov 4, 2024
f75fb49
we no longer need to auth via the query command
jaymedina Nov 4, 2024
65cfebb
run for every commit when the PR is labeled
jaymedina Nov 5, 2024
9d8fde0
V script to make dummy table
jaymedina Nov 5, 2024
6f2052a
quick change
jaymedina Nov 5, 2024
55cbfd8
rename V to see if that fixes the problem
jaymedina Nov 5, 2024
9a622ca
change to R script
jaymedina Nov 5, 2024
2c9bab2
rename table. update drop_clone job.
jaymedina Nov 5, 2024
912b974
.
jaymedina Nov 5, 2024
d0880d8
try running schemachange as data_engineer
jaymedina Nov 5, 2024
c6ba8d5
.
jaymedina Nov 5, 2024
8b50065
grant privileges to dpe_engineer
jaymedina Nov 6, 2024
00c2a3c
attempt at using dpe_engineer for schematic again
jaymedina Nov 6, 2024
5babc72
put both configs in the same file
jaymedina Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions .github/workflows/test_with_clone.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
name: Test Changes with Cloned DB

on:
pull_request:
types: [ labeled, synchronize, closed ]
push:

permissions:
contents: read

jobs:

create_clone_and_run_schemachange:
runs-on: ubuntu-latest
if: contains(github.event.pull_request.labels.*.name, 'create_clone_and_run_schemachange') && github.event.pull_request.state == 'open'
environment: dev
env:
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWSQL_PWD }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }}
SNOWFLAKE_USER: ${{ secrets.SNOWSQL_USER }}
SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }}
SNOWFLAKE_CLONE_ROLE: DATA_ENGINEER
SNOWFLAKE_SCHEMACHANGE_ROLE: SYSADMIN
SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE_ORIG: ${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }}
SNOWFLAKE_SYNAPSE_STAGE_STORAGE_INTEGRATION: ${{ vars.SNOWFLAKE_SYNAPSE_STAGE_STORAGE_INTEGRATION }}
SNOWFLAKE_SYNAPSE_STAGE_URL: ${{ vars.SNOWFLAKE_SYNAPSE_STAGE_URL }}
CLONE_NAME: "${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }}_${{ github.head_ref }}"
STACK: ${{ vars.STACK }}

steps:

- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install python libraries
shell: bash
run: |
pip install schemachange==3.6.1
pip install numpy==1.26.4
pip install pandas==1.5.3

- name: Configure Snowflake connections
run: |
# Config file for DPE_ENGINEER
config_file_dpe=$(mktemp)
echo 'default_connection_name = "dpe"' >> $config_file_dpe
echo '[connections.dpe]' >> $config_file_dpe
echo "account = \"${SNOWFLAKE_ACCOUNT}\"" >> $config_file_dpe
echo "user = \"${SNOWFLAKE_USER}\"" >> $config_file_dpe
echo "role = \"${SNOWFLAKE_CLONE_ROLE}\"" >> $config_file_dpe
echo "password = \"${SNOWFLAKE_PASSWORD}\"" >> $config_file_dpe
echo "warehouse = \"${SNOWFLAKE_WAREHOUSE}\"" >> $config_file_dpe
echo 'authenticator = "SNOWFLAKE"' >> $config_file_dpe

# Config file for SYSADMIN
echo '[connections.sysadmin]' >> $config_file_dpe
echo "account = \"${SNOWFLAKE_ACCOUNT}\"" >> $config_file_dpe
echo "user = \"${SNOWFLAKE_USER}\"" >> $config_file_dpe
echo "role = \"${SNOWFLAKE_SCHEMACHANGE_ROLE}\"" >> $config_file_dpe
echo "password = \"${SNOWFLAKE_PASSWORD}\"" >> $config_file_dpe
echo "warehouse = \"${SNOWFLAKE_WAREHOUSE}\"" >> $config_file_dpe
echo 'authenticator = "SNOWFLAKE"' >> $config_file_dpe

# Write config paths to environment
echo "SNOWFLAKE_CONFIG_PATH_DPE=$config_file_dpe" >> $GITHUB_ENV

- name: Install Snowflake CLI with DPE_ENGINEER config
uses: Snowflake-Labs/[email protected]
with:
default-config-file-path: ${{ env.SNOWFLAKE_CONFIG_PATH_DPE }}

- name: Verify Snowflake CLI installation and connections
run: |
snow --version
snow connection test -c sysadmin
snow connection test -c dpe

- name: Sanitize Clone Name
run: |
CLONE_NAME_SANITIZED="${CLONE_NAME//[^a-zA-Z0-9_]/_}"
jaymedina marked this conversation as resolved.
Show resolved Hide resolved
echo "Clone name has been updated! The clone name will be: ${CLONE_NAME_SANITIZED}"
echo "SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE=${CLONE_NAME_SANITIZED}" >> $GITHUB_ENV
jaymedina marked this conversation as resolved.
Show resolved Hide resolved

- name: Zero-copy clone the database
shell: bash
run: |
snow sql -q "CREATE OR REPLACE DATABASE $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE CLONE $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE_ORIG;"

- name: Grant permissions to DPE_ENGINEER on cloned database
shell: bash
run: |
snow connection set-default sysadmin
snow sql -q "GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA ${SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE}.SYNAPSE TO ROLE DPE_ENGINEER;"
snow sql -q "GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA ${SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE}.SYNAPSE_RAW TO ROLE DPE_ENGINEER;"

- name: Run schemachange on the clone as DPE_ENGINEER
shell: bash
run: |
schemachange \
-f synapse_data_warehouse \
-a $SNOWFLAKE_ACCOUNT \
-u $SNOWFLAKE_USER \
-r $SNOWFLAKE_CLONE_ROLE \
-w $SNOWFLAKE_WAREHOUSE \
--config-folder synapse_data_warehouse

drop_clone:
runs-on: ubuntu-latest
if: github.event.pull_request.merged == true || github.event.action == 'closed'
environment: dev
env:
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWSQL_PWD }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }}
SNOWFLAKE_USER: ${{ secrets.SNOWSQL_USER }}
SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }}
SNOWFLAKE_CLONE_ROLE: DATA_ENGINEER
CLONE_NAME: "${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }}_${{ github.head_ref }}"

steps:

- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Configure Snowflake connection
run: |
# Create temporary files for config.toml and our private key
config_file=$(mktemp)

# Write to config.toml file
echo 'default_connection_name = "dpe"' >> $config_file
echo '[connections.dpe]' >> $config_file
echo "account = \"${SNOWFLAKE_ACCOUNT}\"" >> $config_file
echo "user = \"${SNOWFLAKE_USER}\"" >> $config_file
echo "role = \"${SNOWFLAKE_CLONE_ROLE}\"" >> $config_file
echo "password = \"${SNOWFLAKE_PASSWORD}\"" >> $config_file
echo "warehouse = \"${SNOWFLAKE_WAREHOUSE}\"" >> $config_file
echo 'authenticator = "SNOWFLAKE"' >> $config_file

# Write config.toml path to global environment
echo "SNOWFLAKE_CONFIG_PATH=$config_file" >> $GITHUB_ENV

- name: Install Snowflake CLI
uses: Snowflake-Labs/[email protected]
with:
default-config-file-path: ${{ env.SNOWFLAKE_CONFIG_PATH }}

- name: Verify Snowflake CLI installation and connection
run: |
snow --version
snow connection test

- name: Sanitize Clone Name
run: |
CLONE_NAME_SANITIZED="${CLONE_NAME//[^a-zA-Z0-9_]/_}"
echo "SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE=${CLONE_NAME_SANITIZED}" >> $GITHUB_ENV
echo "Clone name has been updated! The clone name will be: ${SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE}"
echo $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE

- name: Drop the clone
shell: bash
run: |
snow sql -r $SNOWFLAKE_CLONE_ROLE -q "DROP DATABASE IF EXISTS $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE;"
109 changes: 109 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Contributing Guidelines

Welcome, and thanks for your interest in contributing to the `snowflake` repository! :snowflake:

By contributing, you are agreeing that we may redistribute your work under this [license](https://github.com/Sage-Bionetworks/snowflake/tree/snow-90-auto-db-clone?tab=License-1-ov-file#).

## Getting Started

To start contributing, follow these steps to set up and develop on your local repository:

### 1. Clone the Repository

```bash
git clone https://github.com/Sage-Bionetworks/snowflake
```

### 2. Fetch the Latest `dev` Branch

After cloning, navigate to the repository directory:

```bash
cd snowflake
```

Then, fetch the latest updates from the `dev` branch to ensure you’re working with the latest codebase:

```bash
git fetch origin dev
```

### 3. Create a New Branch Off `dev`

Create and checkout your feature branch from the latest `dev` branch. Name it based on the Jira ticket number and your feature/fix. For example:

```bash
git checkout -b snow-123-new-feature origin/dev
```

Your branch will now be tracking `origin/dev` which you can merge with or rebase onto should a merge conflict occur. For more guidance
on how to resolve merge conflicts, [see here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/about-merge-conflicts#resolving-merge-conflicts).

> [!IMPORTANT]
> If you plan to run the automated testing described in section [Running CI Jobs for Database Testing](#running-ci-jobs-for-database-testing), your branch name needs to start with `snow-`,
> otherwise the test deployment will fail.

### 4. Push to The Remote Branch

Once you've made your changes and committed them locally, push your branch to the remote repository:

```
git push origin snow-123-new-feature
```

### 5. Create a Draft Pull Request

In order to initiate automated testing you will need to work on a draft pull request (PR) on GitHub. After pushing your commits to
the remote branch in Step 4, use the GitHub UI to initate a PR and convert it to draft mode.
jaymedina marked this conversation as resolved.
Show resolved Hide resolved

After testing your changes against `schemachange` using the instructions in [Running CI Jobs for Database Testing](#running-ci-jobs-for-database-testing),
you can then take your PR out of draft mode by marking it as Ready for Review in the GitHub UI.

## Running CI Jobs for Database Testing

This repository includes automated CI jobs to validate changes against a cloned database. If you want to trigger these jobs to test your changes in an isolated database environment, please follow the steps below:

### 1. Add the Label

Add the label `create_clone_and_run_schemachange` to your PR to trigger the CI workflow. This job does two things:

* Creates a zero-copy clone of the database and runs your proposed schema changes against it.
* Tests your schema changes on a cloned version of the development database, verifying that your updates work correctly without
affecting the real development database. After the PR is merged, the clone is automatically dropped to free up resources.

> [!IMPORTANT]
> Your cloned database is a clone of the development database as it exists at the time of cloning. Please be mindful that
> **there may have been changes made to the development database since your last clone**.

> [!NOTE]
> As you are developing on your branch, you may want to re-run the `schemachange` test on your updates.
> You can unlabel and relabel the PR with `create_clone_and_run_schemachange` to re-trigger the job.

### 2. Perform Inspection using Snowsight

You can go on Snowsight to perform manual inspection of the schema changes in your cloned database. We recommend using a SQL worksheet for manual quality assurance queries, e.g. ensure there is no row duplication in the new/updated tables.

> [!TIP]
> Your database will be named after your feature branch so it's easy to find on Snowsight. For example, if your feature branch is called
> `snow-123-new-feature`, your database might be called `SYNAPSE_DATA_WAREHOUSE_DEV_SNOW_123_NEW_FEATURE`.

### 3. Manually Drop the Cloned Database (Optional)

There is a second job in the repository (`drop_clone`) that will drop your branch's database clone once it has been merged into `dev`.
In other words, once your cloned database is created for testing, it will remain open until your PR is closed (unless you manually drop it).

An initial clone of the development database will not incur new resource costs, **HOWEVER**, when a clone deviates from the original
(e.g. new schema changes are applied for testing), the cloned database will begin to incur costs the longer it exists in our warehouse.
**Please be mindful of the amount of time your PR stays open**, as cloned databases do not get dropped until a PR is merged. For example, if your PR is open for >1 week, consider manually dropping your cloned database on Snowflake to avoid unnecessary cost.

> [!NOTE]
> Keep in mind that after dropping your cloned database, you will still have access to it through Snowflake's "Time Travel"
> feature. Your database is retained through "Time Travel" for X amount of time before it is permanently deleted. To see
> how long your database can be accessed after dropping, run the following query in a SQL worksheet on Snowsight and look
> for the keyword `DATA_RETENTION_TIME_IN_DAYS`:
>
> ```
> SHOW PARAMETERS IN DATABASE <your-database-name>;
> ```

Following these guidelines helps maintain a clean, efficient, and well-tested codebase. Thank you for contributing!
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ pip install "snowflake-connector-python[pandas]" "synapseclient[pandas]" python-

## Contributing

WIP
For contribution guidelines, please see the `CONTRIBUTING.md` file in this repository.

## Visualizing with Streamlit

Expand Down
17 changes: 17 additions & 0 deletions synapse_data_warehouse/synapse/tables/R__my_table.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
USE SCHEMA {{database_name}}.synapse; --noqa: JJ01,PRS,TMP

-- Create the dummy table with example columns
CREATE TABLE IF NOT EXISTS my_table2 (
id INT,
name STRING,
created_at TIMESTAMP,
value FLOAT
);

-- Insert arbitrary rows into the dummy table
INSERT INTO my_table2 (id, name, created_at, value) VALUES
(1, 'Alpha', CURRENT_TIMESTAMP, 1000.5),
(2, 'Beta', CURRENT_TIMESTAMP, 20.0),
(3, 'Gamma', CURRENT_TIMESTAMP, 30.75),
(4, 'Delta', CURRENT_TIMESTAMP, 40.1),
(5, 'Epsilon', CURRENT_TIMESTAMP, 50.9);
Loading