Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new GKE cluster for ci-pr.yaml #2418

Merged
merged 6 commits into from
Mar 14, 2024
Merged

Use new GKE cluster for ci-pr.yaml #2418

merged 6 commits into from
Mar 14, 2024

Conversation

NimJay
Copy link
Collaborator

@NimJay NimJay commented Mar 13, 2024

Background

  • Currently, the deployment-tests GitHub Action is broken — for other pull-requests (see example error logs).
  • Specifically, deployment of the adservice (Java microservice) is failing:
  • The error we're seeing in the GKE cluster:
pod/adservice-746b58f97b-gsxzr: container server terminated with exit code 139 ...
A fatal error has been detected by the Java Runtime Environment: ...
Problematic frame:
      > [adservice-746b58f97b-gsxzr server] # C  [profiler_java_agent.so+0x897522]  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*)+0xc ...
Can not save log file, dump to screen...
VM state: not at safepoint (not fully initialized) ...
VM Mutex/Monitor currently owned by a thread: None ...
  • Based on our investigation, this seems like an issue with the Nodes currently used by the "online-boutique-prs" GKE cluster. Reason for conclusion: the images deploy fine in other clusters.

Changes from this PR

  • In this PR, we:
    • Add Terraform for a new autopilot GKE cluster ("prs-gke-cluster"). I have terraform apply-ed.
    • Update GitHub Actions to use the new GKE cluster instead of the existing "online-boutique-prs" cluster.
    • Add a README.md about using the Terraform.

How to test

  • We have to make sure the GitHub checks work (in this pull-request). Passed. ✅
  • We should also check the staging URL. Works. ✅
  • After merging, we should check that:
    • cleanup.yaml successfully cleans up the staging deployment created by this pull-request.
    • ci-main.yaml successfully deploys the HEAD commit in main to the new cluster (prs-gke-cluster).

Additional info

@NimJay NimJay changed the title Use new PR cluster for ci-pr.yaml Use new GKE cluster for ci-pr.yaml Mar 13, 2024
Copy link

🚲 PR staged at http://34.133.162.82

Copy link
Member

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Checks are all passing.

Copy link

🚲 PR staged at http://34.133.162.82

PR_CLUSTER: "online-boutique-prs"
ZONE: "us-central1-c"
PR_CLUSTER: "prs-gke-cluster"
REGION: "us-central1"
Copy link
Collaborator Author

@NimJay NimJay Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale for change: Since autopilot = regional and standard = zonal.

Copy link

🚲 PR staged at http://34.133.162.82

* Ideally, you would see `Apply complete! Resources: 0 added, 0 changed, 0 destroyed.` in the output.
1. Make your desired changes to the Terraform code.
1. Apply the Terraform: `terraform apply -var project_id=${PROJECT_ID}`
* This time, Terraform will prompt you confirm your changes before applying them.
Copy link
Collaborator Author

@NimJay NimJay Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: Ideally, these instructions would be made redundant by a GitHub Action or Cloud Build trigger that terraform applys Terraform merged into the main branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I have a similar issue on the GKE samples side: GoogleCloudPlatform/kubernetes-engine-samples#611

Copy link

🚲 PR staged at http://34.133.162.82

bucket = "cicd-terraform-state"
prefix = "terraform-state"
}
}
Copy link
Collaborator Author

@NimJay NimJay Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up: I stored the Terraform state in this Google Cloud Storage (GCS) bucket.

@NimJay NimJay marked this pull request as ready for review March 14, 2024 18:36
@NimJay NimJay requested review from yoshi-approver and a team as code owners March 14, 2024 18:36
Copy link

🚲 PR staged at http://34.133.162.82

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put everything inside the .github/terraform folder to match https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.

@NimJay NimJay requested a review from bourgeoisor March 14, 2024 18:47
Copy link
Member

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

* Ideally, you would see `Apply complete! Resources: 0 added, 0 changed, 0 destroyed.` in the output.
1. Make your desired changes to the Terraform code.
1. Apply the Terraform: `terraform apply -var project_id=${PROJECT_ID}`
* This time, Terraform will prompt you confirm your changes before applying them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I have a similar issue on the GKE samples side: GoogleCloudPlatform/kubernetes-engine-samples#611

@@ -0,0 +1,116 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a great idea. I know we have an internal doc with what infra we need to recreate the CICD environment for Online Boutique, but it's nice to have it all codify and easily-reproducible in a TF script. Thanks!

@NimJay NimJay merged commit 1ef9480 into main Mar 14, 2024
10 checks passed
@NimJay NimJay deleted the nimjay-ci branch March 14, 2024 20:59
@NimJay NimJay mentioned this pull request Apr 24, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants