Skip to content

Commit

Permalink
docs: add docs on term suggestion (#11606)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored Oct 15, 2024
1 parent e0939c7 commit 0d06a61
Show file tree
Hide file tree
Showing 4 changed files with 136 additions and 17 deletions.
12 changes: 12 additions & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,18 @@ module.exports = {
id: "docs/automations/snowflake-tag-propagation",
className: "saasOnly",
},
{
label: "AI Classification",
type: "doc",
id: "docs/automations/ai-term-suggestion",
className: "saasOnly",
},
{
label: "AI Documentation",
type: "doc",
id: "docs/automations/ai-docs",
className: "saasOnly",
},
],
},
{
Expand Down
36 changes: 36 additions & 0 deletions docs/automations/ai-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# AI Documentation

<FeatureAvailability saasOnly />

:::info

This feature is currently in closed beta. Reach out to your Acryl representative to get access.

:::

With AI-powered documentation, you can automatically generate documentation for tables and columns.

<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/_7DieZeZspY?si=Q5FkCA0gZPEFMj0Y" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</p>

## Configuring

No configuration is required - just hit "Generate" on any table or column in the UI.

## How it works

Generating good documentation requires a holistic understanding of the data. Information we take into account includes, but is not limited to:

- Dataset name and any existing documentation
- Column name, type, description, and sample values
- Lineage relationships to upstream and downstream assets
- Metadata about other related assets

Data privacy: Your metadata is not sent to any third-party LLMs. We use AWS Bedrock internally, which means all metadata remains within the Acryl AWS account. We do not fine-tune on customer data.

## Limitations

- This feature is powered by an LLM, which can produce inaccurate results. While we've taken steps to reduce the likelihood of hallucinations, they can still occur.
72 changes: 72 additions & 0 deletions docs/automations/ai-term-suggestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# AI Glossary Term Suggestions

<FeatureAvailability saasOnly />

:::info

This feature is currently in closed beta. Reach out to your Acryl representative to get access.

:::

The AI Glossary Term Suggestion automation uses LLMs to suggest [Glossary Terms](../glossary/business-glossary.md) for tables and columns in your data.

This is useful for improving coverage of glossary terms across your organization, which is important for compliance and governance efforts.

This automation can:

- Automatically suggests glossary terms for tables and columns.
- Goes beyond a predefined set of terms and works with your business glossary.
- Generates [proposals](../managed-datahub/approval-workflows.md) for owners to review, or can automatically add terms to tables/columns.
- Automatically adjusts to human-provided feedback and curation (coming soon).

## Prerequisites

- A business glossary with terms defined. Additional metadata, like documentation and existing term assignments, will improve the accuracy of our suggestions.

## Configuring

1. **Navigate to Automations**: Click on 'Govern' > 'Automations' in the navigation bar.

<p align="center">
<img width="30%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automations-nav-link.png"/>
</p>

2. **Create the Automation**: Click on 'Create' and select 'AI Glossary Term Suggestions'.

<p align="center">
<img width="40%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/ai-term-suggestion/automation-type.png"/>
</p>

3. **Configure the Automation**: Fill in the required fields to configure the automation.
The main fields to configure are (1) what terms to use for suggestions and (2) what entities to generate suggestions for.

<p align="center">
<img width="50%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/ai-term-suggestion/automation-config.png"/>
</p>

4. Once it's enabled, that's it! You'll start to see terms show up in the UI, either on assets or in the proposals page.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/ai-term-suggestion/term-proposals.png"/>
</p>

## How it works

The automation will scan through all the datasets matched by the configured filters. For each one, it will generate suggestions.
If new entities are added that match the configured filters, those will also be classified within 24 hours.

We take into account the following metadata when generating suggestions:

- Dataset name and description
- Column name, type, description, and sample values
- Glossary term name, documentation, and hierarchy
- Feedback loop: existing assignments and accepted/rejected proposals (coming soon)

Data privacy: Your metadata is not sent to any third-party LLMs. We use AWS Bedrock internally, which means all metadata remains within the Acryl AWS account. We do not fine-tune on customer data.

## Limitations

- A single configured automation can classify at most 10k entities.
- We cannot do partial reclassification. If you add a new column to an existing table, we won't regenerate suggestions for that table.
33 changes: 16 additions & 17 deletions docs/automations/snowflake-tag-propagation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

import FeatureAvailability from '@site/src/components/FeatureAvailability';

# Snowflake Tag Propagation Automation
Expand All @@ -20,22 +19,22 @@ both columns and tables back to Snowflake. This automation is available in DataH

1. **Navigate to Automations**: Click on 'Govern' > 'Automations' in the navigation bar.

<p align="left">
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automations-nav-link.png"/>
<p align="center">
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automations-nav-link.png"/>
</p>

2. **Create An Automation**: Click on 'Create' and select 'Snowflake Tag Propagation'.

<p align="left">
<img width="30%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/automation-type.png"/>
<p align="center">
<img width="60%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/automation-type.png"/>
</p>

3. **Configure Automation**: Fill in the required fields to connect to Snowflake, along with the name, description, and category.
Note that you can limit propagation based on specific Tags and Glossary Terms. If none are selected, then ALL Tags or Glossary Terms will be automatically
propagated to Snowflake tables and columns. Finally, click 'Save and Run' to start the automation
3. **Configure Automation**: Fill in the required fields to connect to Snowflake, along with the name, description, and category.
Note that you can limit propagation based on specific Tags and Glossary Terms. If none are selected, then ALL Tags or Glossary Terms will be automatically
propagated to Snowflake tables and columns. Finally, click 'Save and Run' to start the automation

<p align="left">
<img width="30%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/automation-form.png"/>
<p align="center">
<img width="60%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/automation-form.png"/>
</p>

## Propagating for Existing Assets
Expand All @@ -46,13 +45,13 @@ Note that it may take some time to complete the initial back-filling process, de
To do so, navigate to the Automation you created in Step 3 above, click the 3-dot "More" menu

<p align="left">
<img width="15%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-more-menu.png"/>
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-more-menu.png"/>
</p>

and then click "Initialize".

<p align="left">
<img width="15%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-initialize.png"/>
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-initialize.png"/>
</p>

This one-time step will kick off the back-filling process for existing descriptions. If you only want to begin propagating
Expand All @@ -68,21 +67,21 @@ that you no longer want propagated descriptions to be visible.
To do this, navigate to the Automation you created in Step 3 above, click the 3-dot "More" menu

<p align="left">
<img width="15%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-more-menu.png"/>
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-more-menu.png"/>
</p>

and then click "Rollback".

<p align="left">
<img width="15%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-rollback.png"/>
<img width="20%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/automation-rollback.png"/>
</p>

This one-time step will remove all propagated tags and glossary terms from Snowflake. To simply stop propagating new tags, you can disable the automation.

## Viewing Propagated Tags

You can view propagated Tags (and corresponding DataHub URNs) inside the Snowflake UI to confirm the automation is working as expected.
You can view propagated Tags (and corresponding DataHub URNs) inside the Snowflake UI to confirm the automation is working as expected.

<p align="left">
<img width="50%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/view-snowflake-tags.png"/>
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/snowflake-tag-propagation/view-snowflake-tags.png"/>
</p>

0 comments on commit 0d06a61

Please sign in to comment.