Skip to content

FlorentBedecarratsNM/pyapacheatlas

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyApacheAtlas: API Support for Azure Purview and Apache Atlas

A python package to work with the Apache Atlas API and support bulk loading, custom lineage, and more from a Pythonic set of classes and Excel templates.

The package currently supports:

  • Bulk upload of entities.
  • Bulk upload of type definitions.
  • Creating custom lineage between two existing entities.
  • Creating custom table and complex column level lineage in the Hive Bridge style.
    • Supports Azure Purview ColumnMapping Attributes.
  • Creating a column lineage scaffolding as in the Hive Bridge Style .
  • Performing "What-If" analysis to check if...
    • Your entities are valid types.
    • Your entities are missing required attributes.
    • Your entities are using undefined attributes.
  • Working with the glossary.
    • Uploading terms.
    • Downloading individual or all terms.
  • Working with classifications.
    • Classify one entity with multiple classifications.
    • Classify multiple entities with a single classification.
    • Remove classification ("declassify") from an entity.
  • Working with relationships.
    • Able to create arbitrary relationships between entities.
    • e.g. associating a given column with a table.
    • Able to upload relationship definitions.
  • Deleting types (by name) or entities (by guid).
  • Search (only for Azure Purview advanced search).
  • Authentication to Azure Purview via Service Principal.
  • Authentication using basic authentication of username and password for open source Atlas.

Quickstart

Install from PyPi

python -m pip install pyapacheatlas

Create a Purview Client Connection

Provides connectivity to your Atlas / Azure Purview service. Supports getting and uploading entities and type defs.

from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core import PurviewClient

auth = ServicePrincipalAuthentication(
    tenant_id = "", 
    client_id = "", 
    client_secret = ""
)

# Create a client to connect to your service.
client = PurviewClient(
    account_name = "Your-Purview-Account-Name",
    authentication = auth
)

For users wanting to use the AtlasClient and Purview, the Atlas Endpoint for Purview is https://{your_purview_name}.catalog.purview.azure.com/api/atlas/v2. The PurviewClient abstracts away having to know the endpoint url and is the better way to use this package with Purview.

Create Entities "By Hand"

You can also create your own entities by hand with the helper AtlasEntity class. Convert it with to_json to prepare it for upload.

from pyapacheatlas.core import AtlasEntity

# Get All Type Defs
all_type_defs = client.get_all_typedefs()

# Get Specific Entities
list_of_entities = client.get_entity(guid=["abc-123-def","ghi-456-jkl"])

# Create a new entity
ae = AtlasEntity(
    name = "my table", 
    typeName = "demo_table", 
    qualified_name = "somedb.schema.mytable",
    guid = -1000
)

# Upload that entity with the client
upload_results = client.upload_entities([ae.to_json()])

Create Entities from Excel

Read from a standardized excel template that supports...

  • Bulk uploading entities into your data catalog.
  • Creating custom table and column level lineage.
  • Creating custom type definitions for datasets
  • Creating custom lineage between existing assets / entities in your data catalog.

See end to end samples for each scenario in the excel samples.

Learn more about the Excel features and configuration in the wiki.

Additional Resources

About

A python package to help work with the apache atlas REST APIs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%