-
Notifications
You must be signed in to change notification settings - Fork 59
Conversation
2dacbf6
to
8ba3946
Compare
@kumare3 could you please take a look? @surindersinghp reviewed it previously and was OK with it. The PR adds the catalog client to cache task executions onto Data Catalog. Added tests, documentation and configs. In the future when we move the IDL to FlyteIDL we can remove the old legacy client. |
} | ||
logger.Debugf(ctx, "Created tag: %v, for task: %v", tagName, task.Id) | ||
|
||
// TODO: We should create the artifact + tag in a transaction when the service supports that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it will avoid race conditions, for example if there are two run of the same task that happen in parallel we may not be tagging the artifact we just created in this service call. It's not a huge deal because either of those artifacts will work, just can be unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Update the executions model to add cluster column
Datacatalog cache
This PR adds the catalog client to cache Task Executions to DataCatalog.
The steps to cache a task execution are:
When retrieving a cached entry:
Here's how fields in Catalog look like when they are cached:
Every task instance is represented as a DataSet:
Every task execution is represented as an Artifact in the Dataset above:
with outputs as ArtifactData:
To retrieve the Artifact, we use the tag associated with the Artifact which is composed of: