page_type | description | languages | products | |||
---|---|---|---|---|---|---|
sample |
This sample demonstrates a data cleaning pipeline with Azure Functions written in Python. |
|
|
This sample demonstrates a data cleaning pipeline with Azure Functions written in Python triggered off a HTTP event from Event Grid to perform some pandas cleaning and reconciliation of CSV files. Using this sample we demonstrate a real use case where this is used to perform cleaning tasks.
- Install Python 3.6+
- Install Functions Core Tools
- Install Docker
- Note: If run on Windows, use Ubuntu WSL to run deploy script
-
Deploy through Azure CLI
- Open AZ CLI and run
az group create -l [region] -n [resourceGroupName]
to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.) - Run
az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json
- Open AZ CLI and run
-
Deploy Function App
- Create/Activate virtual environment
- Run
func azure functionapp publish [functionAppName] --build-native-deps
- Upload s1.csv file into c1raw container
- Watch event grid trigger the CleanTrigger1 function and produce a "cleaned_s1_raw.csv"
- Repeat the same for s2.csv into c2raw container
- Now send the following HTTP request to the Reconcile function to merge
{
"file_1_url" : "https://{storagename}.blob.core.windows.net/c1raw/cleaned_s1_raw.csv",
"file_2_url" : "https://{storagename}.blob.core.windows.net/c2raw/cleaned_s2_raw.csv",
"batchId" : "1122"
}
- Watch it produce final.csv file
- Can use a logic app to call the reconcile method with batch id's