Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Function deployment is Flaky: Error code 3, message: Failed to retrieve function source code #6132

Assignees
Labels

Comments

@bharathkkb
Copy link

bharathkkb commented Apr 17, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.12.20

  • provider.google v3.17.0
  • provider.google-beta v3.17.0

Affected Resource(s)

  • google_cloudfunctions_function

Terraform Configuration Files

//Version config
terraform {
  required_version = ">= 0.12.10"
}

provider "google" {
  version = "~> 3.17.0"
}

provider "random" {}

resource "random_id" "random_number" {
  byte_length = 2
}



data "archive_file" "archive_code" {
  type        = "zip"
  source_dir  = pathexpand("functions_code/")
  output_path = pathexpand("functions_code.zip")
}


module "project-factory" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 7.1"

  name              = "test"
  random_project_id = "true"
  org_id            = "foo"
  billing_account   = "bar"
  activate_apis = [
    "storage-component.googleapis.com",
    "cloudfunctions.googleapis.com",
  ]
  skip_gcloud_download = true
}

resource "google_storage_bucket" "functions_bucket" {
  name               = "gcf_bucket-${random_id.random_number.dec}"
  location           = "US-CENTRAL1"
  storage_class      = "STANDARD"
  force_destroy      = "true"
  bucket_policy_only = "true"
  project            = module.project-factory.project_id
}

resource "google_storage_bucket_object" "functions_code_archive" {
  name                = "functions_code.zip"
  bucket              = google_storage_bucket.functions_bucket.name
  source              = data.archive_file.archive_code.output_path
  storage_class       = "STANDARD"
  content_disposition = "attachment"
  content_encoding    = "gzip"
  content_type        = "application/zip"
}


resource "google_cloudfunctions_function" "function" {
  name                  = "python_function"
  description           = "Pull code from GCS"
  available_memory_mb   = 256
  region                = "us-central1"
  runtime               = "python37"
  trigger_http          = true
  entry_point           = "test_function"
  timeout               = 60
  source_archive_bucket = google_storage_bucket.functions_bucket.name
  source_archive_object = google_storage_bucket_object.functions_code_archive.name
  project               = module.project-factory.project_id
}

Example Py function

def test_function(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
    else:
        return f'It works!'

Debug Output

https://gist.github.com/bharathkkb/23ac7ee3539419a0900351d5594b36a9

Panic Output

N/A

Expected Behavior

Cloud Function should be deployed consistently

Actual Behavior

Cloud Function gets deployment is flaky.
Half the time it errors out with Error code 3, message: Failed to retrieve function source code

Steps to Reproduce

  1. terraform apply

Important Factoids

It always works when apply is run again after failure. Some kind of race condition, maybe?

After failure, I can go to the failed cloud function in UI, copy the CF via UI, and deploy with same config and it works.
This seems to indicate that I am not enforcing some explicit dependency or its a bug.

References

@ghost ghost added the bug label Apr 17, 2020
@bharathkkb
Copy link
Author

cc @angelchang

@angelchang
Copy link

angelchang commented Apr 17, 2020

Encountering same issue without use of project-factory.
Substitute code in place of project-factory displayed in issue:

#creating project under a folder
resource "google_project" "project" {
  name            = "name"
  project_id      = "random_id"
  folder_id       = "foo"
  billing_account = "account"

  #local-exec to avoid https://github.com/terraform-providers/terraform-provider-google/issues/5649
  provisioner "local-exec" {
    command = "sleep 10"
  }
}
#enable api
resource "google_project_service" "gcf_api" {
  project                    = "project_id"
  service                    = "cloudfunctions.googleapis.com"
  disable_dependent_services = true
}

@venkykuberan venkykuberan self-assigned this Apr 17, 2020
@morgante
Copy link

My initial assumption is that object creation might be returning successful before the object is fully propagated.

@angelchang
Copy link

Console UI shows that object was uploaded in bucket before function attempted to deploy. I also tried adding a sleep on the functions deployment triggered by the object upload.

@venkykuberan
Copy link
Contributor

This seems to me an eventually consistency issue, when i separate out the project creation from the rest of the config it works fine all time. Discussing with another developer feels the same. I would recommend separate out the project config or implement some explicit delays on the project creation. As provider function is limited here, i am closing this issue now. Please reopen if you feel otherwise

@bharathkkb
Copy link
Author

@venkykuberan Thanks for looking into this! I had a couple of followup questions.

  • We are creating the project, then enabling APIs and uploading. It seems like this should give it enough time to become eventually consistent?

  • Is there a specific timeout that you recommend to get consistent applies?

  • Is this a limitation on the GCP API part specific to CF because I have other TF configs that include project creation + GKE, project creation + GCE etc which seems to work fine.

@venkykuberan
Copy link
Contributor

@bharathkkb Yes its a limitation on the GCP side, we observed this pattern between Project API and Storage APIs. We have an active issue #5649 to track that.

@bharathkkb
Copy link
Author

@venkykuberan makes sense. Thank you

@bharathkkb
Copy link
Author

This seems to me an eventually consistency issue, when i separate out the project creation from the rest of the config it works fine all time.

Hi @venkykuberan
I noticed the same issue today when project creation was not a part of the same config. I passed in a pre created project id that already had billing.
Could you share your config that you tested out?

@danawillow danawillow reopened this Apr 24, 2020
@danawillow
Copy link
Contributor

If apply immediately after failure works, that implies that we should have the resource retry the create until the API is consistent.

@bharathkkb
Copy link
Author

@danawillow yes, apply immediately after has been consistently working

@emilymye
Copy link
Contributor

When you run into this issue, do you happen to notice whether it tries to recreate the cloud function in the second apply? i.e.

  1. first apply errors out with this error
  2. second apply works --> does it output a plan and finish with "Resources created: 1"? Or does it automatically just detect the function has been created?

Mostly, I'm wondering when this error happens whether we need to retry creation of the function or if it's that the operation returns an error but the function was actually created properly.

@angelchang
Copy link

@emilymye
Tf recreates the function.
Here is the workflow:

  1. first apply errors out
  2. terraform plan displays google_cloudfunctions_function.function is tainted, so must be replaced
  3. terraform apply Resources: 1 added, 1 destroyed. (consistently works)

@andreyka
Copy link

andreyka commented May 26, 2020

Hello everyone,

I have the same issue. Seems that GCP uses own internal source code storage bucket and synchronize it eventually. Adding null_operator for wait doesn't help. When I get errorError:

Error waiting for Creating CloudFunctions Function: Error code 3, message: Failed to retrieve function source code...

I can see the situation that I could not download code if I press download button. But in my src_object source code already exists by the time of function creation.

error_screen

error_screen_download

@ghost
Copy link

ghost commented Jun 27, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.