autonomi-ai · spillai · Feb 1, 2024 · Feb 1, 2024
diff --git a/README.md b/README.md
@@ -1,37 +1,30 @@
-<center><img src="./docs/assets/nos-header.svg" alt="Nitrous Oxide for your AI Infrastructure"></center>
+<center><img src="./docs/assets/nos-header.svg" alt="Nitro Boost for your AI Infrastructure"></center>
 <p></p>
 <p align="center">
+<a href="https://docs.nos.run/"><b>Website</b></a> | <a href="https://docs.nos.run/"><b>Docs</b></a> | <a href="https://github.com/autonomi-ai/nos/tree/main/examples/tutorials"><b>Tutorials</b></a> | <a href="https://github.com/autonomi-ai/nos-playground"><b>Playground</b></a> | <a href="https://docs.nos.run/docs/blog"><b>Blog</b></a> | <a href="https://discord.gg/QAGgvTuvgg"><b>Discord</b></a>
+</p>
+<p align="center">
 <a href="https://pypi.org/project/torch-nos/"><img alt="PyPI Version" src="https://badge.fury.io/py/torch-nos.svg"></a>
 <a href="https://pypi.org/project/torch-nos/"><img alt="PyPI Version" src="https://img.shields.io/pypi/pyversions/torch-nos"></a>
 <a href="https://www.pepy.tech/projects/torch-nos"><img alt="PyPI Downloads" src="https://img.shields.io/pypi/dm/torch-nos"></a>
-<a href="https://github.com/autonomi-ai/nos/blob/main/LICENSE"><img alt="PyPi Downloads" src="https://img.shields.io/github/license/autonomi-ai/nos.svg"></a><br>
+<a href="https://hub.docker.com/repository/docker/autonomi/nos/general"><img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/autonomi/nos.svg"></a><br>
+<a href="https://github.com/autonomi-ai/nos/blob/main/LICENSE"><img alt="PyPi Downloads" src="https://img.shields.io/github/license/autonomi-ai/nos.svg"></a>
 <a href="https://discord.gg/QAGgvTuvgg"><img alt="Discord" src="https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord"></a>
 <a href="https://twitter.com/autonomi_ai"><img alt="PyPi Version" src="https://img.shields.io/twitter/follow/autonomi_ai.svg?style=social&logo=twitter"></a>
 </p>
-<p align="center">
-<a href="https://docs.nos.run/"><b>Website</b></a> | <a href="https://docs.nos.run/"><b>Docs</b></a> | <a href="https://docs.nos.run/docs/blog"><b>Blog</b></a> |    <a href="https://discord.gg/QAGgvTuvgg"><b>Discord</b></a>
-</p>
 
-## What is NOS?
-**NOS (`torch-nos`)** is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.
-<br>
+**NOS** is a fast and flexible PyTorch inference server that runs on any cloud or AI HW.
 
-## 🛠️ **Why use NOS?**
+## 🛠️ Key Features
 
 - 👩‍💻 **Easy-to-use**: Built for [PyTorch](https://pytorch.org/) and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
-- 🥷 **Flexible**: Run and serve several foundational AI models ([Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [CLIP](https://huggingface.co/openai/clip-vit-base-patch32), [Whisper](https://huggingface.co/openai/whisper-large-v2)) in a single place.
-- 🔌 **Pluggable:** Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
-- 🚀 **Scalable**: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
-- 📦 **Extensible**: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
-- ⚙️ **HW-accelerated:** Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
-- ☁️ **Cloud-agnostic:** Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
-
-
-> **NOS** inherits its name from **N**itrous **O**xide **S**ystem, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.
-<br>
+- 🥷 **Multi-modal & Multi-model**: Serve multiple foundational AI models ([LLMs](https://github.com/autonomi-ai/nos/blob/main/nos/models/llm.py), [Diffusion](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [Embeddings](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py), [Speech-to-Text](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py) and [Object Detection](https://github.com/autonomi-ai/nos/blob/main/nos/models/yolox.py)) simultaneously, in a single server.
+- ⚙️ **HW-aware Runtime:** Deploy PyTorch models effortlessly on modern AI accelerators (NVIDIA GPUs, AWS Inferentia2, AMD - coming soon, and even CPUs).
+- ☁️ **Cloud-agnostic Containers:** Run on any cloud (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
 
 ## 🔥 What's New
 
+* **[Feb 2024]** ✍️ [blog] [Introducing the NOS Inferentia2 (`inf2`) runtime](https://docs.nos.run/docs/blog/introducing-the-nos-inferentia2-runtime.html).
 * **[Jan 2024]** ✍️ [blog] [Serving LLMs on a budget](https://docs.nos.run/docs/blog/serving-llms-on-a-budget.html) with [SkyServe](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html).
 * **[Jan 2024]** 📚 [docs] [NOS x SkyPilot Integration](https://docs.nos.run/docs/integrations/skypilot.html) page!
 * **[Jan 2024]** ✍️ [blog] [Getting started with NOS tutorials](https://docs.nos.run/docs/blog/-getting-started-with-nos-tutorials.html) is available [here](./examples/tutorials/)!
@@ -42,78 +35,72 @@
 We highly recommend that you go to our [quickstart guide](https://docs.nos.run/docs/quickstart.html) to get started. To install the NOS client, you can run the following command:
 
 ```bash
-conda create -n nos python=3.8
+conda create -n nos python=3.8 -y
 conda activate nos
 pip install torch-nos
 ```
 
 Once the client is installed, you can start the NOS server via the NOS `serve` CLI. This will automatically detect your local environment, download the docker runtime image and spin up the NOS server:
 
 ```bash
-nos serve up --http
+nos serve up --http --logging-level INFO
 ```
 
-You are now ready to run your first inference request with NOS! You can run any of the following commands to try things out.
-
-*Note:* For the above quickstart to work out of the box, we expect the user to have [Docker](https://docs.docker.com/get-docker/), [Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) and [Docker Compose](https://docs.docker.com/compose/install/) pre-installed on their machine. If you run into any issues, please visit our [quickstart](https://docs.nos.run/docs/quickstart.html) page or ping us on [Discord](https://discord.gg/QAGgvTuvgg).
+You are now ready to run your [first inference request](#👩‍💻-what-can-nos-do) with NOS! You can run any of the following commands to try things out. You can set the logging level to `DEBUG` if you want more detailed information from the server.
 
 ## 👩‍💻 **What can NOS do?**
 
 ### 💬 Chat / LLM Agents (ChatGPT-as-a-Service)
 ---
-NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite LLM client.
+NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite OpenAI-compatible LLM client to talk to NOS.
 
 <img src="docs/assets/llama_nos.gif" width="400">
 
-<table>
-<tr>
-<td> gRPC API ⚡ </td>
-<td> REST API </td>
-</tr>
-<tr>
-<td>
+<br>
+<details>
+<summary> API / Usage</summary>
+<br>
 
+<b>gRPC API ⚡</b>
 ```python
 from nos.client import Client
 
 client = Client("[::]:50051")
 
-model = client.Module("meta-llama/Llama-2-7b-chat-hf")
+model = client.Module("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
 response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)
 ```
 
-</td>
-<td>
-
+<b>REST API</b>
 ```bash
 curl \
 -X POST http://localhost:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
-    "model": "meta-llama/Llama-2-7b-chat-hf",
-    "messages": [{"role": "user", "content": "Tell me a story of 1000 words with emojis"}],
-    "temperature": 0.7, "stream": true
+    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+    "messages": [{
+        "role": "user",
+        "content": "Tell me a story of 1000 words with emojis"
+    }],
+    "temperature": 0.7,
+    "stream": true
   }'
 ```
 
-</td>
-</tr>
-</table>
-
+</details>
 
 ### 🏞️ Image Generation (Stable-Diffusion-as-a-Service)
 ---
 Build MidJourney discord bots in seconds.
 
 <img src="docs/assets/hippo_with_glasses_sdxl.jpg" width="400">
 
-<table>
-<tr>
-<td> gRPC API ⚡ </td>
-<td> REST API </td>
-</tr>
-<tr>
-<td>
+<br>
+<details>
+<summary> API / Usage</summary>
+<br>
+
+<b>gRPC API ⚡</b>
 
 ```python
 from nos.client import Client
@@ -125,9 +112,7 @@ image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],
               width=1024, height=1024, num_images=1)
 ```
 
-</td>
-<td>
-
+<b>REST API</b>
 
 ```bash
 curl \
@@ -137,32 +122,26 @@ curl \
     "model_id": "stabilityai/stable-diffusion-xl-base-1-0",
     "inputs": {
         "prompts": ["hippo with glasses in a library, cartoon styling"],
-        "width": 1024,
-        "height": 1024,
+        "width": 1024, "height": 1024,
         "num_images": 1
     }
 }'
 ```
 
-</td>
-</tr>
-</table>
-
-
+</details>
 
 ### 🧠 Text & Image Embedding (CLIP-as-a-Service)
 ---
-Build scalable semantic search of images/videos in minutes.
+Build [scalable semantic search of images/videos](https://docs.nos.run/docs/demos/video-search.html) in minutes.
 
 <img src="docs/assets/embedding.png" width="400">
 
-<table>
-<tr>
-<td> gRPC API ⚡ </td>
-<td> REST API </td>
-</tr>
-<tr>
-<td>
+<br>
+<details>
+<summary> API / Usage</summary>
+<br>
+
+<b>gRPC API ⚡</b>
 
 ```python
 from nos.client import Client
@@ -173,8 +152,7 @@ clip = client.Module("openai/clip-vit-base-patch32")
 txt_vec = clip.encode_text(texts=["fox jumped over the moon"])
 ```
 
-</td>
-<td>
+<b>REST API</b>
 
 ```bash
 curl \
@@ -189,26 +167,21 @@ curl \
 }'
 ```
 
-</td>
-</tr>
-</table>
-
+</details>
 
 
 ### 🎙️ Audio Transcription (Whisper-as-a-Service)
 ---
-
-Perform real-time audio transcription using Whisper.
+Perform [real-time audio transcription](./examples/tutorials/04-serving-multiple-models/) using Whisper.
 
 <img src="docs/assets/transcription.png" width="400">
 
-<table>
-<tr>
-<td> gRPC API ⚡ </td>
-<td> REST API </td>
-</tr>
-<tr>
-<td>
+<br>
+<details>
+<summary> API / Usage</summary>
+<br>
+
+<b>gRPC API ⚡</b>
 
 ```python
 from pathlib import Path
@@ -222,8 +195,7 @@ with client.UploadFile(Path("audio.wav")) as remote_path:
 # {"chunks": ...}
 ```
 
-</td>
-<td>
+<b>REST API</b>
 
 ```bash
 curl \
@@ -234,24 +206,20 @@ curl \
 -F '[email protected]'
 ```
 
-</td>
-</tr>
-</table>
+</details>
 
 ### 🧐 Object Detection (YOLOX-as-a-Service)
 ---
-
 Run classical computer-vision tasks in 2 lines of code.
 
 <img src="docs/assets/bench_park_detections.png" width="400">
 
-<table>
-<tr>
-<td> gRPC API ⚡ </td>
-<td> REST API </td>
-</tr>
-<tr>
-<td>
+<br>
+<details>
+<summary> API / Usage</summary>
+<br>
+
+<b>gRPC API ⚡</b>
 
 ```python
 from pathlib import Path
@@ -263,8 +231,7 @@ model = client.Module("yolox/medium")
 response = model(images=[Image.open("image.jpg")])
 ```
 
-</td>
-<td>
+<b>REST API</b>
 
 ```bash
 curl \
@@ -275,24 +242,12 @@ curl \
 -F '[email protected]'
 ```
 
-</td>
-</tr>
-</table>
-
+</details>
 
 ### ⚒️ Custom models
 ---
 Want to run models not supported by NOS? You can easily add your own models following the examples in the [NOS Playground](https://github.com/autonomi-ai/nos-playground/tree/main/examples).
 
-
-## 📚 Documentation
-
-- [Tutorials](./examples/tutorials/)
-- [Quickstart](https://docs.nos.run/docs/quickstart.html)
-- [Models](https://docs.nos.run/docs/models/supported-models.html)
-- **Concepts**: [Architecture Overview](https://docs.nos.run/docs/concepts/architecture-overview.html), [ModelSpec](https://docs.nos.run/docs/concepts/model-spec.html), [ModelManager](https://docs.nos.run/docs/concepts/model-manager.html), [Runtime Environments](https://docs.nos.run/docs/concepts/runtime-environments.html)
-- **Demos**: [Building a Discord Image Generation Bot](https://docs.nos.run/docs/demos/discord-bot.html), [Video Search Demo](https://docs.nos.run/docs/demos/video-search.html)
-
 ## 📄 License
 
 This project is licensed under the [Apache-2.0 License](LICENSE).
@@ -304,7 +259,7 @@ NOS collects anonymous usage data using [Sentry](https://sentry.io/). This is us
 ## 🤝 Contributing
 We welcome contributions! Please see our [contributing guide](CONTRIBUTING.md) for more information.
 
-### 🔗  Quick Links
+## 🔗  Quick Links
 
 * 💬 Send us an email at [[email protected]](mailto:[email protected]) or join our [Discord](https://discord.gg/QAGgvTuvgg) for help.
 * 📣 Follow us on [Twitter](https://twitter.com/autonomi\_ai), and [LinkedIn](https://www.linkedin.com/company/autonomi-ai) to keep up-to-date on our products.

diff --git a/docs/blog/assets/nos-inf2.jpg b/docs/blog/assets/nos-inf2.jpg