Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-90] Introduce CI job to test schemachange updates against a clone #78

Open
wants to merge 79 commits into
base: dev
Choose a base branch
from

Conversation

jaymedina
Copy link
Contributor

@jaymedina jaymedina commented Oct 23, 2024

problem

Rather than have it be a manual process, automating the testing of schemachange updates in our branch before merging it into dev will speed up the development life cycle. A design document was made to tackle this problem and is available here.

solution

Untitled diagram-2024-10-29-144419

  • Introduce a new branch label, test_with_clone, to trigger the automated process of zero-copy cloning synapse_data_warehouse_dev and testing new/modified schemachange scripts against it
  • Introduce new CI jobs that reflect the design outlined in the design document
  • Update contribution guidelines to explain these new CI jobs and the corresponding PR labels

testing

  • Test that label create_clone_and_run_schemachange triggers the CI job to run
  • Test that a clone is made of the dev db and not the prod db
  • Test that a clone is named after the branch
  • Test that unlabeling and relabeling the branch just replaces the existing clone, and does not make a new one
  • Test that merging the PR triggers drop_clone to run
  • Test that merging the PR DOES NOT trigger create_clone_and_run_schemachange to run
  • Test that labeling the merged PR with create_clone_and_run_schemachange does not trigger the CI job to run

@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added the create_clone_and_run_schemachange Create a DB clone and run schemachange against it label Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added the create_clone_and_run_schemachange Create a DB clone and run schemachange against it label Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Oct 23, 2024
@jaymedina jaymedina added create_clone_and_run_schemachange Create a DB clone and run schemachange against it and removed create_clone_and_run_schemachange Create a DB clone and run schemachange against it labels Nov 5, 2024
@jaymedina
Copy link
Contributor Author

jaymedina commented Nov 5, 2024

Hey all, just keeping you in the loop with this Snowflake devops stuff:

As I was working on testing an edge case for this PR, I realized there's an issue with using different roles for each step in the CI/CD pipeline. This is how it's working right now:

  • I zero-copy clone the database via the DATA_ENGINEER role since that is the default role that DPE devs have access to and they should be able to see their changes on Snowsight
  • I deploy schemachange updates via the SYSADMIN role since DATA_ENGINEER doesn't have permissions to modify some of the existing tables

So while I'm in DATA_ENGINEER, I can see my cloned DB, but I cannot see my dummy table that I made to test out the R scripts because it was made with SYSADMIN.

@thomasyu888, would it be possible for you to make a separate role for CI/CD work instead of the pipeline using DATA_ENGINEER and SYSADMIN for the two separate steps? This role would:

  • Inherit the access privileges of SYSADMIN (so that schemachange can run without getting errors like these)
  • Be accessible to DPE engineers so they can view their schemachange deployment on their cloned DB on Snowsight

A related question:
I'm assuming the assets currently on Snowflake were made by running schemachange via the SYSADMIN role, correct? I'm able to see these assets while on DATA_ENGINEER, so why is this an issue suddenly?

cc @philerooski

@thomasyu888
Copy link
Member

A related question: I'm assuming the assets currently on Snowflake were made by running schemachange via the SYSADMIN role, correct? I'm able to see these assets while on DATA_ENGINEER, so why is this an issue suddenly?

@jaymedina , @philerooski

Admittedly the roles and grants the database is a bit all over the place... I did manage to do a ROLE cleaning and moved everything so that SYSADMIN role is granted access to all the roles. BUT... one way to do this is to create that CICD role specifically and give it OWNERSHIP of all the tables, and have that role roll up into DATA_ENGINEER. The CI/CD role can then only be assumed by a service account. (That said, that may be splitting hairs because if the CI/CD role rolls up into DATA_ENGINEER, then DATA_ENGINEER will have all the permissions that that CI/CD role has)

image

I'll let @philerooski ultimately decide, but if there's other snowflake work to be done, we can follow the process that exists now until we can thoughtfully implement this all the way.

@philerooski
Copy link
Contributor

@thomasyu888 @jaymedina

Snowflake has a suggested framework for organizing roles which we will adopt at some point in the near future. As far as Snowflake objects are concerned, SYSADMIN ought to inherit privileges on all objects. So a role which is responsible for running our CI/CD and inherits privileges from SYSADMIN wouldn't be possible within this framework.

DATAENGINEER would be one of the Custom Roles in the framework. What I'm wondering: what makes DATAENGINEER less privileged than SYSADMIN? Which objects privileges in our account do we want to reserve only for SYSADMIN?

To the point -- whichever role we use for CI/CD, it ought to have privileges on every object in the dev Synapse data warehouse database. This might be SYSADMIN, or it could be a more precisely scoped role (SYSADMIN_DEV? I'm open to better ideas, or just using SYSADMIN). The cleanest way to then grant DATAENGINEER privileges to the objects which the CI/CD role has created is to create and grant a DATABASE ROLE specific to the cloned database to DATAENGINEER. (Do we automatically get a clone of the Synapse dev database's DATABASE ROLE, if it exists?).

Here's one possible implementation:
snowflake_cicd_roles

@thomasyu888
Copy link
Member

thomasyu888 commented Nov 5, 2024

Snowflake has a suggested framework for organizing roles which we will adopt at some point in the near future. As far as Snowflake objects are concerned, SYSADMIN ought to inherit privileges on all objects.

@philerooski just a note here - I did that role refactor based off of your learnings a little bit ago - and it is a lot cleaner.

Furthermore, the roles organization will be complicated by the jumpcloud integration, but that's out of scope for this PR.

image

Some questions

  1. do we just want to give DATA_ENGINEER ownership rights to tables? (it cascades up into SYSADMIN anyways)
  2. can schemachange run as DATA_ENGINEER role for these specific jobs?
  3. during cloning, we could add a grant statement to GRANT ALL FUTURE DYNAMIC TABLES TO ROLE DATA ENGINEER. Thoughts?

I do like your idea of having a CLONED_DB_DATABASE_ROLE, but then that still doesn't resolve the fact that schemachange runs as SYSADMIN right now

@jaymedina
Copy link
Contributor Author

Yes, as a temporary solution, I can implement suggestions 3 and 2 so we can use DPE_ENGINEER for the schemachange execution. I'll be adding this in the next commit.

@philerooski
Copy link
Contributor

I did that role refactor based off of your learnings a little bit ago

Should the ticket be closed or modified, then?

during cloning, we could add a grant statement to GRANT ALL FUTURE DYNAMIC TABLES TO ROLE DATA ENGINEER. Thoughts?

What is it about dynamic tables specifically that requires this statement? Do we need FUTURE grants for other types of objects?

My main concern with the proposed workaround is that DATA_ENGINEER is quickly becoming a behemoth with no clear delineation between itself and SYSADMIN. But if we're in agreement that this is a temporary solution then I'm fine with proceeding.

@jaymedina
Copy link
Contributor Author

My main concern with the proposed workaround is that DATA_ENGINEER is quickly becoming a behemoth with no clear delineation between itself and SYSADMIN

I agree with this. I don't think we should permanently grant the engineer role SYSADMIN privileges. A heads up: I'm fiddling around with a temporary solution for this until the role re-org. My commits will show me granting privileges to DPE_ENGINEER, but I'm being mindful that these privileges will only exist within the scope of the cloned DB, and not within the actual dev and prod databases.

Copy link

sonarcloud bot commented Nov 6, 2024

@thomasyu888
Copy link
Member

My main concern with the proposed workaround is that DATA_ENGINEER is quickly becoming a behemoth with no clear delineation between itself and SYSADMIN

@philerooski / @jaymedina , I think there should be another ticket to fully determine whether the DATA_ENGINEER role should just be the OWNER of synapse_data_warehouse and synapse_data_warehouse_dev. This way, the data_engineer role will be able to do all operations if needs to within those databases.

Similarly, since the DATA_ENGINEER role is rolled up into SYSADMIN, SYSADMIN should be able to do everything the DATA_ENGINEER role can do. Note: I'm not proposing DATA_ENGINEER have the same level of permissions as SYSADMIN, but what I'm saying is the DATA_ENGINEER role could have all the privileges the SYSADMIN role currently has WITHIN those two databases and clones

^ That said, I'm not tied to that, and I'll let Phil decide when we get to that design ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
create_clone_and_run_schemachange Create a DB clone and run schemachange against it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants