Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vertex AI Dataset #4863

Merged
merged 4 commits into from
Jun 21, 2021
Merged

Conversation

upodroid
Copy link
Contributor

Part of hashicorp/terraform-provider-google#9298

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Generated Terraform, and ran make test and make lint to ensure it passes unit and linter tests.
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Ran relevant acceptance tests (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read the Release Notes Guide before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

`google_vertex_ai_dataset`

Co-authored-by: upodroid <[email protected]>
@google-cla google-cla bot added the cla: yes label Jun 11, 2021
@modular-magician
Copy link
Collaborator

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@melinath, please review this PR or find an appropriate assignee.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 8 files changed, 992 insertions(+), 2 deletions(-))
Terraform Beta: Diff ( 8 files changed, 992 insertions(+), 2 deletions(-))
TF Conversion: Diff ( 2 files changed, 118 insertions(+))
TF OiCS: Diff ( 4 files changed, 106 insertions(+))

@upodroid
Copy link
Contributor Author

upodroid commented Jun 11, 2021

@rileykarson We need to rewrite the Operations handling code to deal with APIs that require region specific API subdomains to be inferred for the url.

This is the first API (excluding Cloud Run, we seem to polling the status field of the resource for changes instead of tracking an Operation) that has regional endpoints that need the API subdomain to be inferred from location parameter.

2021/06/11 18:04:29 [DEBUG] Waiting for state to become: [done: true]
2021/06/11 18:04:29 [DEBUG] Waiting for state to become: [success]
2021/06/11 18:04:29 [WARN] Got error running Terraform: exit status 1

Error: Error waiting to create Dataset: Error waiting for Creating Dataset: error while retrieving operation: parse "https://{{location}}-aiplatform.googleapis.com/v1/projects/550924169191/locations/us-central1/datasets/7207105206025191424/operations/7881554709473918976": invalid character "{" in host name

  on terraform_plugin_test.tf line 2, in resource "google_vertex_ai_dataset" "dataset":
   2: resource "google_vertex_ai_dataset" "dataset" {


    provider_test.go:276: Step 1/2 error: Error running apply: exit status 1
        
        Error: Error waiting to create Dataset: Error waiting for Creating Dataset: error while retrieving operation: parse "https://{{location}}-aiplatform.googleapis.com/v1/projects/550924169191/locations/us-central1/datasets/7207105206025191424/operations/7881554709473918976": invalid character "{" in host name
        
          on terraform_plugin_test.tf line 2, in resource "google_vertex_ai_dataset" "dataset":
           2: resource "google_vertex_ai_dataset" "dataset" {
        
func createVertexAIWaiter(config *Config, op map[string]interface{}, project, activity, userAgent string) (*VertexAIOperationWaiter, error) {
	w := &VertexAIOperationWaiter{
		Config:    config,
		UserAgent: userAgent,
		Project:   project,
	}
	if err := w.CommonOperationWaiter.SetOp(op); err != nil {
		return nil, err
	}
	return w, nil
}

// nolint: deadcode,unused
func vertexAIOperationWaitTimeWithResponse(config *Config, op map[string]interface{}, response *map[string]interface{}, project, activity, userAgent string, timeout time.Duration) error {
	w, err := createVertexAIWaiter(config, op, project, activity, userAgent)
	if err != nil {
		return err
	}
	if err := OperationWait(w, activity, timeout, config.PollInterval); err != nil {
		return err
	}
	return json.Unmarshal([]byte(w.CommonOperationWaiter.Op.Response), response)
}

I'm thinking of passing d through the Op function(vertexAIOperationWaitTime) from Create/Update/Delete functions and call replaceVars somewhere in vertexAIOperationWaitTime in vertex_ai_operation.go or QueryOp in common_operation.go

Let me know what you think.

@modular-magician
Copy link
Collaborator

I have triggered VCR tests based on this PR's diffs. See the results here: "https://ci-oss.hashicorp.engineering/viewQueued.html?itemId=191770"

@melinath
Copy link
Member

This is the first API (excluding Cloud Run, we seem to polling the status field of the resource for changes instead of tracking an Operation) that has regional endpoints that need the API subdomain to be inferred from location parameter.
[snip]
I'm thinking of passing d through the Op function(vertexAIOperationWaitTime) from Create/Update/Delete functions and call replaceVars somewhere in vertexAIOperationWaitTime in vertex_ai_operation.go or QueryOp in common_operation.go

Hmm, yeah, it looks like QueryOp would likely be the place to make this call. It's a bit unfortunate that we're going to end up modifying all the other operation waiters but I don't know that it would be worth avoiding that.

@upodroid
Copy link
Contributor Author

Can someone reopen hashicorp/terraform-provider-google#4624 and target it for v4 release? Found this while reworking Operations. It is a v3 removal that didn't happen on time.

I have adjusted all the Operations that are used by MM and partially handwritten products. I didn't edit operations for products like Cloud Functions as the entire API is handwritten.

Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are some file conflicts that need to be resolved. After some more consideration & talking with the team, would it be possible to implement this without the plumbing? I'm sorry to be backtracking on this; the summary is that this is the only resource we expect to need this feature in all of MMv1 ever, so it would be preferable to use a custom operation handler that extracts the location from the operation path (which should already contain it) rather than doing this plumbing.

@upodroid
Copy link
Contributor Author

Hmm

From: https://github.com/modular-magician/terraform-provider-google/compare/auto-pr-4863-old..auto-pr-4863

func (w *VertexAIOperationWaiter) QueryOp() (interface{}, error) {
	if w == nil {
		return nil, fmt.Errorf("Cannot query operation, it's unset or nil.")
	}
	// Returns the proper get.
	url := fmt.Sprintf("https://{{location}}-aiplatform.googleapis.com/v1/%s", w.CommonOperationWaiter.Op.Name)

	return sendRequest(w.Config, "GET", w.Project, url, w.UserAgent, nil)
}

Let me see if I can pull location/region cleanly from w.CommonOperationWaiter.Op.Name and write vertex_ai_operation.go by hand.

@upodroid
Copy link
Contributor Author

How do you exclude *_operation.go from being generated by MM like appengine_operation.go?

Keep getting:

I, [2021-06-16T20:11:54.800214 #97565]  INFO -- : products/iambeta: Not specified, skipping generation
I, [2021-06-16T20:11:55.078214 #97565]  INFO -- : products/sql: Compiling provider config
I, [2021-06-16T20:11:55.294222 #97565]  INFO -- : products/sql: Not specified, skipping generation
I, [2021-06-16T20:11:55.294702 #97565]  INFO -- : Copying common files for terraform
#<Thread:0x00007fd1e20f0f08@/Users/REDACTED/Desktop/Git/magic-modules/mmv1/provider/core.rb:143 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
/Users/REDACTED/Desktop/Git/magic-modules/mmv1/provider/core.rb:152:in `block (2 levels) in copy_file_list': /Users/REDACTED/go/src/github.com/hashicorp/terraform-provider-google-beta/google-beta/vertex_ai_operation.go was already modified during this run. 2021-06-16 20:11:23 +0000 (RuntimeError)
bundler: failed to load command: compiler (compiler)
RuntimeError: /Users/REDACTED/go/src/github.com/hashicorp/terraform-provider-google-beta/google-beta/vertex_ai_operation.go was already modified during this run. 2021-06-16 20:11:23 +0000
  /Users/REDACTED/Desktop/Git/magic-modules/mmv1/provider/core.rb:152:in `block (2 levels) in copy_file_list'

@melinath
Copy link
Member

melinath commented Jun 18, 2021

I think that if you leave out autogen_async that should do it: https://github.com/GoogleCloudPlatform/magic-modules/blob/master/mmv1/provider/terraform.rb#L237

@upodroid
Copy link
Contributor Author

It is fixed and ready

Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a general approach; the build is currently failing because of the deletion of compute_shared_operation.go. That deletion seems unrelated to this PR; if that's correct, could you undo it for now / move it to a separate PR?

Also, could you add a few unit tests for GetRegionFromRegionalSelfLink?

mmv1/third_party/terraform/utils/self_link_helpers.go Outdated Show resolved Hide resolved
@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 9 files changed, 790 insertions(+), 2 deletions(-))
Terraform Beta: Diff ( 9 files changed, 790 insertions(+), 2 deletions(-))
TF Conversion: Diff ( 3 files changed, 126 insertions(+))
TF OiCS: Diff ( 4 files changed, 106 insertions(+))

@modular-magician
Copy link
Collaborator

I have triggered VCR tests based on this PR's diffs. See the results here: "https://ci-oss.hashicorp.engineering/viewQueued.html?itemId=192739"

@modular-magician
Copy link
Collaborator

I have triggered VCR tests in RECORDING mode for the following tests that failed during VCR: TestAccVertexAIDataset_vertexAiDatasetExample|TestAccTags You can view the result here: "https://ci-oss.hashicorp.engineering/viewQueued.html?itemId=192832"

@upodroid
Copy link
Contributor Author

This looks ready, it would be great if i can get it merged today so I can start working on other Vertex AI resources.

Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@melinath
Copy link
Member

The tests failed because the API wasn't enabled; here's a test run just for the new test: https://ci-oss.hashicorp.engineering/buildConfiguration/GoogleCloudBeta_ProviderGoogleCloudBetaMmUpstreamVcr/193010

@melinath
Copy link
Member

test passed

@racosta
Copy link

racosta commented Mar 15, 2022

I know this has been merged, but I'm not sure if what I'm seeing is a bug to be reported or a enhancement to be requested.

The example provided in this PR and shown in the documentation creates the resource based on the metadata schema, but there doesn't seem to be a way to pass configuration details for what that schema expects.

For example, (slightly modifying the one provided in the documentation)

resource "google_vertex_ai_dataset" "dataset" {
  display_name          = "terraform"
  metadata_schema_uri   = "gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml"
  region                = "us-central1"
}

The schema refers to this document:

title: Tabular
type: object
description: >
  The metadata of tabular Datasets. Can be used in Dataset.metadata_schema_uri
  field.
properties:
  inputConfig:
    description: >
      The tabular Dataset's data source. The Dataset doesn't store the data
      directly, but only pointer(s) to its data.
    oneOf:
    - type: object
      properties:
        type:
          type: string
          enum: [gcs_source]
        uri:
          type: array
          items:
            type: string
          description: >
            Cloud Storage URI of one or more files. Only CSV files are supported.
            The first line of the CSV file is used as the header.
            If there are multiple files, the header is the first line of
            the lexicographically first file, the other files must either
            contain the exact same header or omit the header.
    - type: object
      properties:
        type:
          type: string
          enum: [bigquery_source]
        uri:
          type: string
          description: The URI of a BigQuery table.
    discriminator:
      propertyName: type

Naturally I would want to configure the inputConfig to point to either a GCS bucket URI or a BigQuery URI. There doesn't seem to be a way to do that unless I've missed it by searching through the code.

Should I open a bug or an enhancement for this issue?

@melinath
Copy link
Member

@racosta It sounds like you're requesting a new field, so that would be an enhancement request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants