Azure-EmbodiedAI-Sample

Welcome to the Azure EmbodiedAI Sample repository on GitHub. With only about 500 lines of custom code written with the help of ChatGPT this simple Unity project demonstrates how to build an EmbodiedAI character which is able to speak, listen and see by using Azure Cognitive Services (Azure Speech SDK), Azure Kinect Body Tracking SDK and Unity Visual Scripting. Initial design for Microsoft's legendary Clippy character is inspired by Blender project here.

Motivation

Make everything as simple as possible, but not simpler.

Getting Started

This Unity sample is intended for Windows Desktop platform. Azure Kinect DK and Azure Kinect Body Tracking SDK are leveraged for Body Tracking (seeing), it is recommended to use NVIDIA GeForce RTX graphics and CUDA processing mode for better performance, however CPU processing mode may also be used otherwise. For speaking and listening Azure Speech SDK is utilized, it is assumed that your PC has microphone and speakers.

This sample is built based on the following foundational GitHub templates:

Sample Unity Body Tracking Application
Synthesize speech in Unity using the Speech SDK for Unity
Recognize speech from a microphone in Unity using the Speech SDK for Unity

Solution architecture of this template is described in detail in the following Medium articles:

Thoughts about Embodiment of AI
More about Embodiment of AI

Solution demo video can be found here and Code walkthrough is available here.

Prerequisites

Please see the list below with the compatibility guidance for the versions of software the sample was developed and tested on.

Prerequisite	Download Link	Version
Unity	https://unity.com/download	2021.3.16f1 (LTS)
Azure Speech SDK for Unity	https://aka.ms/csspeech/unitypackage	1.24.2
Azure Kinect Body Tracking SDK	https://www.microsoft.com/en-us/download/details.aspx?id=104221	1.1.2
NVIDIA CUDA Toolkit	https://developer.nvidia.com/cuda-downloads	11.7
NVIDIA cuDNN for CUDA	https://developer.nvidia.com/rdp/cudnn-archive	8.5.0

Please note that you will need to install Unity, Azure Kinect Body Tracking SDK, NVIDIA CUDA Toolkit and NVIDIA cuDNN for CUDA on your PC. And Azure Speech SDK for Unity and select Azure Kinect Body Tracking SDK, NVIDIA CUDA Toolkit and NVIDIA cuDNN for CUDA DLLs have been already packaged with the project for convenience of setup and configuration.

Installing

After you've cloned or downloaded the code please install the prerequisites. Please note that select large files (DLLs) are stored with Git LFS, so you will need to manually download the files listed here.

The default processing mode for Body Tracking is set to CUDA to optimize performance, to configure CUDA support please refer to guidance here and here. However you may change the processing mode to CPU as necessary.

Because this sample leverages Azure Backend service deployed in the Cloud, please follow Deployment guidance to deploy the necessary Azure Backend services. Once deployed, please copy this Auth file to the Desktop of your PC and propagate configuration details based on the deployed Azure Backend services.

Also this sample introduces a number of custom C# nodes for Unity Visual Scripting, after you open the project in Unity please Generate Nodes as described here.

To run the sample please open SampleScene and hit Play. You may also select Clippy game object in the Hierarchy and open SampleStateMachine graph to visually inspect the sample flow.

Cloud Deployment

This sample features One-Click Deployment for the necessary Azure Backend services. If you need to sign up for Azure Subscription please follow this link.

Edge Deployment

Capability	Docker Container	Protocol	Port
STT (Speech-to-Text)	mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text	WS(S)	5001
TTS (Neural-Text-to-Speech)	mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech	WS(S)	5002
LUIS (Intent Recognition)	mcr.microsoft.com/azure-cognitive-services/language/luis	HTTP(S)	5003
ELASTIC-OSS Elasticsearch	docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2	HTTP(S)	9200
ELASTIC-OSS Kibana	docker.elastic.co/kibana/kibana-oss:7.10.2	HTTP(S)	5601
GPT-J		HTTP(S)	5004

Next Steps

This minimalistic template may be a great jump start for your own EmbodiedAI project and the possibilities from now on are truly endless.

This template already gets you started with the following capabilities:

Speech Synthesis: Azure Speech SDK SpeechSynthesizer class
Speech Recognition: Azure Speech SDK SpeechRecognizer class
Computer Vision (Non-verbal interaction): Azure Kinect Body Tracking SDK
Dialog Management: Unity Visual Scripting State Graph and Script Graphs

Please consider adding the following capabilities to your EmbodiedAI project as described in this Medium article:

Natural Language Understanding: Language Understanding (LUIS), Conversational Language Understanding (CLU), etc.
Natural Language Generation: Azure OpenAI Service
More Computer Vision: Azure Kinect Sensor SDK
Knowledge Base: Azure Cognitive Search (Semantic Search), Question Answering, etc.
Any integrations with custom AI services via WebAPI: Azure Machine Learning, etc.

Note: If you are interested in leveraging Microsot Power Virtual Agents (PVA) on Azure while building your EmbodiedAI project, please review this Medium article which provides details about using DialogServiceConnector class based on Interact with a bot in C# Unity GitHub template.

Please also review important Game Development Concepts in the context of Azure Speech SDK explained in detail in this article.

Future Updates

Character Rig and animations using Unity Mechanim
Azure Backend services deployment options using Bicep and Terraform
Backend services deployed on the Edge using Helm chart on Kubernetes (K8S) cluster
Sample template for Unreal Engine (UE) using Blueprints
Sample template for WebGL using Babylon.js

Lip Sync Visemes

This template features a simple and yet robust implementation of Lip Sync Visemes using Azure Speech SDK as explained in this video. You can choose to introduce visemes for your character using Maya Blend shapes, Blender shape keys, etc. Then in Unity you can animate visemes by manipulating with Blend shape weights, by means of Unity Blend tree, etc.

Azure OpenAI

Sample context: Clippy is an endearing and helpful digital assistant, designed to make using Microsoft Office Suite of products more efficient and user-friendly. With his iconic paperclip shape and friendly personality, Clippy is always ready and willing to assist users with any task or question they may have. His ability to anticipate and address potential issues before they even arise has made him a beloved and iconic figure in the world of technology, widely recognized as an invaluable tool for productivity.

Experiment (1/23/2023): Azure OpenAI is deployed in South Central US region and Unity App client runs in West US geo.

Chat turn #	Sample Human request	Responsiveness test example 1	Responsiveness test example 2
1	`What is Azure Cognitive Search?`	Total tokens: 117, Azure OpenAI processing timing: 90.6115 ms, Azure OpenAI Studio browser request/response timing: 1.56 s	Total tokens: 117, Azure OpenAI processing timing: 96.2115 ms, Azure OpenAI Studio browser request/response timing: 1.40 s
2	`What are the benefits of using Azure Cognitive Search?`	Total tokens: 177, Azure OpenAI processing timing: 128.6817 ms, Azure OpenAI Studio browser request/response timing: 1.41 s	Total tokens: 167, Azure OpenAI processing timing: 73.1746 ms, Azure OpenAI Studio browser request/response timing: 957.91 ms
3	`How can I index my data with Azure Cognitive Search?`	Total tokens: 246, Azure OpenAI processing timing: 168.1983 ms, Azure OpenAI Studio browser request/response timing: 1.34 s	Total tokens: 229, Azure OpenAI processing timing: 229.8757 ms, Azure OpenAI Studio browser request/response timing: 597.07 ms

GPU Acceleration

When building modern immersive EmbodiedAI experiences responsiveness, fluidity and speed of interaction are very important. Thus GPU Acceleration will be an important factor for the successful adoption of your solution.

We've already described above how we take advantage of GPU Acceleration for Azure Kinect Body Tracking SDK.

GPU Acceleration can also help in the context of any custom Machine Learning models you may want to build, train, deploy and expose as a Web Service (Web API) for consumption in your EmbodiedAI solution. Please find more information about Building and Deploying custom Machine Learning models in Azure Cloud and Edge in these Medium articles: here and here. PyTorch is a popular open source machine learning framework which allows you to build such models and has built-in support for GPU Acceleration. When building and deploying custom models in Azure Cloud you can take advantage of Azure Machine Learning capabilities. When dealing with Hybrid and Edge scenarios you may consider Azure Stack managed appliances, for example, Azure Stack Edge Pro with GPU as described in these Medium articles: here and here. For the latter, you may also need to build your own Docker images for models training, running processes optimization or real-time inference. In the table below we provide a handful of options for GPU Accelerated PyTorch applications (MNIST example) using NVIDIA GPUs with CUDA support which may be great for training and/or inference.

Base Docker Image	Description	Dockerfile
python:3.10	Python	Link
nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04	NVIDIA CUDA + cuDNN	Link
pytorch/pytorch:1.13.1-cuda11.6-cudnn8-devel	PyTorch	Link
mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04	Azure ML (AML)	Link
nvcr.io/nvidia/pytorch:23.02-py3	NVIDIA PyTorch	Link
mcr.microsoft.com/vscode/devcontainers/anaconda	VSCode Dev Container	Link

To build Docker images please use docker build -t your_name . command. To run Docker containers with GPU acceleration please use docker run --gpus all your_name command.

Reference local setup: A laptop with NVIDIA RTX GPU (great for training), a desktop with 2 NVIDIA GTX GPUs and Azure Stack Edge Pro with NVIDIA Tesla T4 GPU (great for inference).

Disclaimer

This code is provided "as is" without warranties to be used at your own risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Azure-EmbodiedAI-Sample

Motivation

Getting Started

Prerequisites

Installing

Cloud Deployment

Edge Deployment

Next Steps

Future Updates

Lip Sync Visemes

Azure OpenAI

GPU Acceleration

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

Azure-EmbodiedAI-Sample

Motivation

Getting Started

Prerequisites

Installing

Cloud Deployment

Edge Deployment

Next Steps

Future Updates

Lip Sync Visemes

Azure OpenAI

GPU Acceleration

Disclaimer