Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(doc): Fix doc typo in transformer #10658

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions metadata-ingestion/docs/transformer/dataset_transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ transformers:
|--------------------|----------|--------------|-------------|---------------------------------------------------------------------|
| `owner_urns` | ✅ | list[string] | | List of owner urns. |
| `ownership_type` | | string | "DATAOWNER" | ownership type of the owners (either as enum or ownership type urn) |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove ownership from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

For transformer behaviour on `replace_existing` and `semantics`, please refer section [Relationship Between replace_existing And semantics](#relationship-between-replace_existing-and-semantics).
Expand Down Expand Up @@ -270,7 +270,7 @@ Note that whatever owners you send via `simple_remove_dataset_ownership` will ov
|-----------------------------|----------|--------------|---------------|------------------------------------------------------------------|
| `extract_tags_from` | ✅ | string | `urn` | Which field to extract tag from. Currently only `urn` is supported. |
| `extract_tags_regex` | ✅ | string | `.*` | Regex to use to extract tag.|
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove globalTags from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

Let’s suppose we’d like to add a dataset tags based on part of urn. To do so, we can use the `extract_dataset_tags` transformer that’s included in the ingestion framework.
Expand All @@ -297,7 +297,7 @@ a tag called `USA-ops-team` and `Canada-marketing` will be added to them respect
| Field | Required | Type | Default | Description |
|-----------------------------|----------|--------------|---------------|------------------------------------------------------------------|
| `tag_urns` | ✅ | list[string] | | List of globalTags urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove globalTags from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

Let’s suppose we’d like to add a set of dataset tags. To do so, we can use the `simple_add_dataset_tags` transformer that’s included in the ingestion framework.
Expand Down Expand Up @@ -350,7 +350,7 @@ The config, which we’d append to our ingestion recipe YAML, would look like th
| Field | Required | Type | Default | Description |
|-----------------------------|----------|----------------------|-------------|---------------------------------------------------------------------------------------|
| `tag_pattern` | ✅ | map[regx, list[urn]] | | Entity urn with regular expression and list of tags urn apply to matching entity urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove globalTags from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

Let’s suppose we’d like to append a series of tags to specific datasets. To do so, we can use the `pattern_add_dataset_tags` module that’s included in the ingestion framework. This will match the regex pattern to `urn` of the dataset and assign the respective tags urns given in the array.
Expand Down Expand Up @@ -407,7 +407,7 @@ The config, which we’d append to our ingestion recipe YAML, would look like th
| Field | Required | Type | Default | Description |
|-----------------------------|----------|--------------------------------------------|---------------|----------------------------------------------------------------------------|
| `get_tags_to_add` | ✅ | callable[[str], list[TagAssociationClass]] | | A function which takes entity urn as input and return TagAssociationClass. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove globalTags from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

If you'd like to add more complex logic for assigning tags, you can use the more generic add_dataset_tags transformer, which calls a user-provided function to determine the tags for each dataset.
Expand Down Expand Up @@ -477,7 +477,7 @@ Finally, you can install and use your custom transformer as [shown here](#instal
| Field | Required | Type | Default | Description |
|-----------------------------|----------|--------------|--------------|------------------------------------------------------------------|
| `path_templates` | ✅ | list[string] | | List of path templates. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove browsePath from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

If you would like to add to browse paths of dataset can use this transformer. There are 3 optional variables that you can use to get information from the dataset `urn`:
Expand Down Expand Up @@ -562,7 +562,7 @@ In this case, the resulting dataset will have only 1 browse path, the one from t
| Field | Required | Type | Default | Description |
|-----------------------------|----------|--------------|---------------|------------------------------------------------------------------|
| `term_urns` | ✅ | list[string] | | List of glossaryTerms urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove glossaryTerms from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

We can use a similar convention to associate [Glossary Terms](../../../docs/generated/ingestion/sources/business-glossary.md) to datasets.
Expand Down Expand Up @@ -617,7 +617,7 @@ The config, which we’d append to our ingestion recipe YAML, would look like th
| Field | Required | Type | Default | Description |
|-----------------------------|--------|----------------------|--------------|-------------------------------------------------------------------------------------------------|
| `term_pattern` | ✅ | map[regx, list[urn]] | | entity urn with regular expression and list of glossaryTerms urn apply to matching entity urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove glossaryTerms from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

We can add glossary terms to datasets based on a regex filter.
Expand Down Expand Up @@ -673,7 +673,7 @@ We can add glossary terms to datasets based on a regex filter.
| Field | Required | Type | Default | Description |
|-----------------------------|---------|----------------------|-------------|------------------------------------------------------------------------------------------------|
| `term_pattern` | ✅ | map[regx, list[urn]] | | entity urn with regular expression and list of glossaryTerms urn apply to matching entity urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove glossaryTerms from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

We can add glossary terms to schema fields based on a regex filter.
Expand Down Expand Up @@ -730,7 +730,7 @@ Note that only terms from the first matching pattern will be applied.
| Field | Required | Type | Default | Description |
|-----------------------------|----------|----------------------|-------------|---------------------------------------------------------------------------------------|
| `tag_pattern` | ✅ | map[regx, list[urn]] | | entity urn with regular expression and list of tags urn apply to matching entity urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove globalTags from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |


Expand Down Expand Up @@ -790,7 +790,7 @@ The config would look like this:
| Field | Required | Type | Default | Description |
|--------------------|---------|----------------|-------------|------------------------------------------------------------------|
| `properties` | ✅ | dict[str, str] | | Map of key value pair. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove datasetProperties from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

`simple_add_dataset_properties` transformer assigns the properties to dataset entity from the configuration.
Expand Down Expand Up @@ -849,7 +849,7 @@ overwrite the previous value.
| Field | Required | Type | Default | Description |
|--------------------------------|----------|--------------------------------------------|-------------|------------------------------------------------------------------|
| `add_properties_resolver_class`| ✅ | Type[AddDatasetPropertiesResolverBase] | | A class extends from `AddDatasetPropertiesResolverBase` |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove datasetProperties from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

If you'd like to add more complex logic for assigning properties, you can use the `add_dataset_properties` transformer, which calls a user-provided class (that extends from `AddDatasetPropertiesResolverBase` class) to determine the properties for each dataset.
Expand Down Expand Up @@ -948,7 +948,7 @@ transformers:
| Field | Required | Type | Default | Description |
|--------------------|----------|------------------------|---------------|------------------------------------------------------------------|
| `domains` | ✅ | list[union[urn, str]] | | List of simple domain name or domain urns. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove domains from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

For transformer behaviour on `replace_existing` and `semantics`, please refer section [Relationship Between replace_existing And semantics](#relationship-between-replace_existing-and-semantics).
Expand Down Expand Up @@ -1008,7 +1008,7 @@ transformers:
| Field | Required | Type | Default | Description |
|----------------------------|-----------|---------------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------|
| `domain_pattern` | ✅ | map[regx, list[union[urn, str]] | | dataset urn with regular expression and list of simple domain name or domain urn need to be apply on matching dataset urn. |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `replace_existing` | | boolean | `false` | Whether to remove domains from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |

Let’s suppose we’d like to append a series of domain to specific datasets. To do so, we can use the pattern_add_dataset_domain transformer that’s included in the ingestion framework.
Expand Down
Loading