-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-90] Introduce CI job to test schemachange
updates against a clone
#78
Open
jaymedina
wants to merge
79
commits into
dev
Choose a base branch
from
snow-90-auto-db-clone
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 57 commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
ed485ba
Initial commit. A simple clone of the DB
jaymedina b388915
Install snowsql
jaymedina e3c575a
add pw
jaymedina ae8d75d
add debug step
jaymedina 4371e0b
.
jaymedina c9989e5
debugging
jaymedina 1b136f0
syntax
jaymedina 680ecaa
directly pass vars
jaymedina 7d946ed
new .sql file to run the command isntead
jaymedina a342f85
global vars
jaymedina 244216b
dynamically change the name of the clone DB
jaymedina 4f35760
syntax
jaymedina ab2a048
syntax
jaymedina be2e82f
using IDENTIFIER for db object
jaymedina fb4a208
.
jaymedina 913ef4c
.
jaymedina ded7408
set var sub true
jaymedina b25b10d
add identifier
jaymedina 2bdc6ef
.
jaymedina 2090fad
.
jaymedina b0ebf0c
typo
jaymedina d91d5ab
.
jaymedina 6a7b631
.
jaymedina d8c8927
?
jaymedina 38da909
.
jaymedina c75abba
go back to running command in ci step itself
jaymedina 1a5160f
role is data_engineer
jaymedina 1edbb83
branch sha
jaymedina 1bc7f04
Run schemachagne on the clone
jaymedina a94ea31
.
jaymedina cee2c5e
.
jaymedina df3a91f
remove clone_db.sql
jaymedina 46c8390
label name changed
jaymedina ca73fb6
change job name. change clone name convention. fix typos. new delete_…
jaymedina 6fe39ca
address syntax issues in clone name
jaymedina 88c7b00
add a status message
jaymedina 939c9f7
missing env var
jaymedina 4116019
vars
jaymedina bac91b8
sysadmin to run schemachange
jaymedina 1ffafa9
syntax fix
jaymedina c44ae9e
refactor and introduce drop_clone job
jaymedina a3eade5
no refactoring
jaymedina d76999f
drop_clone will only run after PR merging
jaymedina 89f06b4
new contributing.md
jaymedina 93253a2
updating contributing
jaymedina 6e0d537
move tip
jaymedina 9b187c1
.
jaymedina 3bbd05c
.
jaymedina 4f042c0
.
jaymedina d2a2452
.
jaymedina ba646fa
remove redundancies from drop_clone job
jaymedina 6e24035
testing
jaymedina fc3a85d
testing done
jaymedina cb218a5
final comment on time travel
jaymedina 06996ab
Update CONTRIBUTING.md
jaymedina ea07518
Update CONTRIBUTING.md
jaymedina 9d4aaaa
fix typo. small formatting update.
jaymedina 03dfc5a
only run create_clone on open prs. run drop_clone on closed and merge…
jaymedina ab9297a
improve clarity in contributing.md
jaymedina 8498feb
rephrasing
jaymedina ed645f3
use snowflake cli
jaymedina ced4d2c
snowflake connection
jaymedina 3a90e92
.
jaymedina a85a5ca
lol
jaymedina 8e78bd8
.
jaymedina 294244f
.
jaymedina f75fb49
we no longer need to auth via the query command
jaymedina 65cfebb
run for every commit when the PR is labeled
jaymedina 9d8fde0
V script to make dummy table
jaymedina 6f2052a
quick change
jaymedina 55cbfd8
rename V to see if that fixes the problem
jaymedina 9a622ca
change to R script
jaymedina 2c9bab2
rename table. update drop_clone job.
jaymedina 912b974
.
jaymedina d0880d8
try running schemachange as data_engineer
jaymedina c6ba8d5
.
jaymedina 8b50065
grant privileges to dpe_engineer
jaymedina 00c2a3c
attempt at using dpe_engineer for schematic again
jaymedina 5babc72
put both configs in the same file
jaymedina File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
--- | ||
name: Test Changes with Cloned DB | ||
|
||
on: | ||
pull_request: | ||
types: [ labeled, closed ] | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
|
||
create_clone_and_run_schemachange: | ||
runs-on: ubuntu-latest | ||
if: contains(github.event.pull_request.labels.*.name, 'create_clone_and_run_schemachange') | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
environment: dev | ||
env: | ||
SNOWSQL_PWD: ${{ secrets.SNOWSQL_PWD }} | ||
SNOWSQL_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }} | ||
SNOWSQL_USER: ${{ secrets.SNOWSQL_USER }} | ||
SNOWSQL_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }} | ||
SNOWSQL_CLONE_ROLE: DATA_ENGINEER | ||
SNOWSQL_SCHEMACHANGE_ROLE: SYSADMIN | ||
SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE_ORIG: ${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }} | ||
SNOWFLAKE_SYNAPSE_STAGE_STORAGE_INTEGRATION: ${{ vars.SNOWFLAKE_SYNAPSE_STAGE_STORAGE_INTEGRATION }} | ||
SNOWFLAKE_SYNAPSE_STAGE_URL: ${{ vars.SNOWFLAKE_SYNAPSE_STAGE_URL }} | ||
CLONE_NAME: "${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }}_${{ github.head_ref }}" | ||
STACK: ${{ vars.STACK }} | ||
|
||
steps: | ||
|
||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.10' | ||
|
||
- name: Install python libraries | ||
shell: bash | ||
run: | | ||
pip install schemachange==3.6.1 | ||
pip install numpy==1.26.4 | ||
pip install pandas==1.5.3 | ||
|
||
- name: Install SnowSQL | ||
run: | | ||
curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash | ||
SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- name: Verify SnowSQL installation | ||
run: ~/bin/snowsql -v | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- name: Sanitize Clone Name | ||
run: | | ||
CLONE_NAME_SANITIZED="${CLONE_NAME//[^a-zA-Z0-9_]/_}" | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
echo "Clone name has been updated! The clone name will be: ${CLONE_NAME_SANITIZED}" | ||
echo "SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE=${CLONE_NAME_SANITIZED}" >> $GITHUB_ENV | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- name: Zero-copy clone the database | ||
shell: bash | ||
run: | | ||
~/bin/snowsql -r $SNOWSQL_CLONE_ROLE -q "CREATE OR REPLACE DATABASE $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE CLONE $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE_ORIG;" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some interesting things I learned while reading the docs on cloning:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for taking the time to look into these things!
|
||
|
||
- name: Run schemachange on the clone | ||
shell: bash | ||
run: | | ||
schemachange \ | ||
-f synapse_data_warehouse \ | ||
-a $SNOWSQL_ACCOUNT \ | ||
-u $SNOWSQL_USER \ | ||
-r $SNOWSQL_SCHEMACHANGE_ROLE \ | ||
-w $SNOWSQL_WAREHOUSE \ | ||
--config-folder synapse_data_warehouse | ||
|
||
drop_clone: | ||
runs-on: ubuntu-latest | ||
if: github.event.pull_request.merged == true | ||
environment: dev | ||
env: | ||
SNOWSQL_PWD: ${{ secrets.SNOWSQL_PWD }} | ||
SNOWSQL_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }} | ||
SNOWSQL_USER: ${{ secrets.SNOWSQL_USER }} | ||
SNOWSQL_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }} | ||
SNOWSQL_CLONE_ROLE: DATA_ENGINEER | ||
CLONE_NAME: "${{ vars.SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE }}_${{ github.head_ref }}" | ||
|
||
steps: | ||
|
||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.10' | ||
|
||
- name: Install SnowSQL | ||
run: | | ||
curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash | ||
SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash | ||
|
||
- name: Verify SnowSQL installation | ||
run: ~/bin/snowsql -v | ||
|
||
- name: Sanitize Clone Name | ||
run: | | ||
CLONE_NAME_SANITIZED="${CLONE_NAME//[^a-zA-Z0-9_]/_}" | ||
echo "SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE=${CLONE_NAME_SANITIZED}" >> $GITHUB_ENV | ||
echo "Clone name has been updated! The clone name will be: ${SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE}" | ||
echo $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE | ||
|
||
- name: Drop the clone | ||
shell: bash | ||
run: | | ||
~/bin/snowsql -r $SNOWSQL_CLONE_ROLE -q "DROP DATABASE IF EXISTS $SNOWFLAKE_SYNAPSE_DATA_WAREHOUSE_DATABASE;" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Contributing Guidelines | ||
|
||
Welcome, and thanks for your interest in contributing to the `snowflake` repository! :snowflake: | ||
|
||
By contributing, you are agreeing that we may redistribute your work under this [license](https://github.com/Sage-Bionetworks/snowflake/tree/snow-90-auto-db-clone?tab=License-1-ov-file#). | ||
|
||
## Getting Started | ||
|
||
To start contributing, follow these steps to set up and develop on your local repository: | ||
|
||
### 1. Clone the Repository | ||
|
||
```bash | ||
git clone https://github.com/Sage-Bionetworks/snowflake | ||
``` | ||
|
||
### 2. Fetch the Latest `dev` Branch | ||
|
||
After cloning, navigate to the repository directory: | ||
|
||
```bash | ||
cd snowflake | ||
``` | ||
|
||
Then, fetch the latest updates from the `dev` branch to ensure you’re working with the latest codebase: | ||
|
||
```bash | ||
git fetch origin dev | ||
``` | ||
|
||
### 3. Create a New Branch Off `dev` | ||
|
||
Create and checkout your feature branch from the latest `dev` branch. Name it based on the Jira ticket number and your feature/fix. For example: | ||
|
||
```bash | ||
git checkout -b snow-123-new-feature origin/dev | ||
``` | ||
|
||
Your branch will now be tracking `origin/dev` which you can merge with or rebase onto should a merge conflict occur. For more guidance | ||
on how to resolve merge conflicts, [see here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/about-merge-conflicts#resolving-merge-conflicts). | ||
|
||
> [!IMPORTANT] | ||
> If you plan to run the automated testing described in section [Running CI Jobs for Database Testing](#running-ci-jobs-for-database-testing), your branch name needs to start with `snow-`, | ||
> otherwise the test deployment will fail. | ||
|
||
### 4. Push to The Remote Branch | ||
|
||
Once you've made your changes and committed them locally, push your branch to the remote repository: | ||
|
||
``` | ||
git push origin snow-123-new-feature | ||
``` | ||
|
||
### 5. Create a Draft Pull Request | ||
|
||
In order to initiate automated testing you will need to work on a draft pull request (PR) on GitHub. After pushing your commits to | ||
the remote branch in Step 4, use the GitHub UI to initate a PR and convert it to draft mode. | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Running CI Jobs for Database Testing | ||
|
||
This repository includes automated CI jobs to validate changes against a cloned database. If you want to trigger these jobs to test your changes in an isolated database environment, please follow the steps below: | ||
|
||
### 1. Add the Label | ||
|
||
Add the label `create_clone_and_run_schemachange` to your PR to trigger the CI workflow. This job does two things: | ||
|
||
* Creates a zero-copy clone of the database and runs your proposed schema changes against it. | ||
* Tests your schema changes on a cloned version of the development database, verifying that your updates work correctly without | ||
affecting the real development database. After the PR is merged, the clone is automatically dropped to free up resources. | ||
|
||
> [!IMPORTANT] | ||
> Your cloned database is a clone of the development database as it exists at the time of cloning. Please be mindful that **there may be | ||
> other developers working on their own version of the development database, whose changes may not be reflected in your clone**. Keep an | ||
> eye out for other PRs with the `create_clone_and_run_schemachange` label, and ensure that you are not performing changes on the | ||
> same tables to avoid conflicts. | ||
jaymedina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### 2. Perform Inspection using Snowsight | ||
|
||
You can go on Snowsight to perform manual inspection of the schema changes in your cloned database. We recommend using a SQL worksheet for manual quality assurance queries, e.g. ensure there is no row duplication in the new/updated tables. | ||
|
||
> [!TIP] | ||
> Your database will be named after your feature branch so it's easy to find on Snowsight. For example, if your feature branch is called | ||
> `snow-123-new-feature`, your database might be called `SYNAPSE_DATA_WAREHOUSE_DEV_SNOW_123_NEW_FEATURE`. | ||
|
||
### 3. Manually Drop the Cloned Database (Optional) | ||
|
||
There is a second job in the repository (`drop_clone`) that will drop your branch's database clone once it has been merged into `dev`. | ||
In other words, once your cloned database is created for testing, it will remain open until your PR is closed (unless you manually drop it). | ||
|
||
An initial clone of the development database will not incur new resource costs, **HOWEVER**, when a clone deviates from the original | ||
(e.g. new schema changes are applied for testing), the cloned database will begin to incur costs the longer it exists in our warehouse. | ||
**Please be mindful of the amount of time your PR stays open**, as cloned databases do not get dropped until a PR is merged. For example, if your PR is open for >1 week, consider manually dropping your cloned database on Snowflake to avoid unnecessary cost. | ||
|
||
> [!NOTE] | ||
> Keep in mind that after dropping your cloned database, you will still have access to it through Snowflake's "Time Travel" | ||
> feature. Your database is retained through "Time Travel" for X amount of time before it is permanently deleted. To see | ||
> how long your database can be accessed after dropping, run the following query in a SQL worksheet on Snowsight and look | ||
> for the keyword `DATA_RETENTION_TIME_IN_DAYS`: | ||
> | ||
> ``` | ||
> SHOW PARAMETERS IN DATABASE <your-database-name>; | ||
> ``` | ||
|
||
Following these guidelines helps maintain a clean, efficient, and well-tested codebase. Thank you for contributing! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this trigger if the PR is created with the label already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also -- what if the PR is already labeled, I make some additional commits to my branch, and I want to
CREATE OR REPLACE ... CLONE
again? Do I need to remove the label and add it again?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I also had this concern but from what I've read, CI jobs can't get triggered on a closed PR. To be safe, I'll add this as a test after this gets merged.
Yes, you would need to remove it and add it again. I'll add this as a note in the contributing doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ab9297a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to cause confusion if the clone of my PR does not reflect the HEAD of my feature branch. My preferred implementation would be that new commits to my branch also triggers the clone, if my branch meets the other conditions (PR/labeled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that this shouldn't cause confusion since you can match up the "last updated" on your DB to when you made your last commit, but I see this point.
If we go with that implementation, we may as well trigger the job for every commit regardless of whether it's labeled or not. I'm still on the fence with that idea. You mentioned that your CI pipeline for RECOVER triggers a zero copy clone for every commit, right? Did you experience any technical issues for edge cases where commits are made before the previous run was complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had continued Snowflake development for RECOVER we probably would have used CLONE in some capacity. Instead, we had a stored procedure for each table that could be manually invoked to load data into that table, and a SQL script to do this in bulk for every data type, although this script wasn't used in our CI/CD. Snowflake stuff for RECOVER was never used in a production capacity, it was primarily a proof of concept.
How come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we follow the schemachange workflow, the clone is mainly a space for you to test things manually. Unsure if every commit is necessary right now - but I wonder if we could iteratively get there, or when we have the bandwidth to actually move away from schemachange.
There would be additional setup required right now for schemachange to deploy on feature branches into the clone, and would require more reworking on the CI/CD pipeline.
If we have plans on moving away from schemachange in the future, we may want to follow the schemachange framework for now: https://github.com/Snowflake-Labs/schemachange?tab=readme-ov-file#sample-devops-process-flow.
I chose this particular framework for ease and the setup time was much faster than my experience with using the snow cli to manage all of this. The downside ofcourse is the annoying
V_
andR_
scripts.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is the risk of running the job for a commit before the previous job has finished. In other words: back-to-back commits causing cloned DB conflicts, or clashing between R scripts that are running against the same clone.
I think you bring a valid point re: manually triggering the job could lead to version inconsistency between the feature branch and the cloned DB, so I'm open to doing the run-for-every-commit implementation. But I need to do a test on this PR for the edge-cases I'm worried about, and then we can move forward from there. @philerooski