Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Detection From Webcam Stream Guide #9336

Merged
merged 42 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
cfe4247
guides
freddyaboulton Sep 12, 2024
3a28eac
Add demo
freddyaboulton Sep 12, 2024
dadda8c
guide
freddyaboulton Sep 12, 2024
c9130ed
Merge branch '5.0-dev' into object-detection-guide
freddyaboulton Sep 12, 2024
3d54d4e
Add info about Powershell client (#9343)
abidlabs Sep 12, 2024
a9b5182
Remove lite/theme.css from the Git-managed file tree (#9335)
whitphx Sep 13, 2024
f7f7885
9227 chatinterface retry bug (#9316)
freddyaboulton Sep 13, 2024
e57f086
Move icons into `IconButtonWrapper` (#9261)
hannahblair Sep 13, 2024
c8510c1
Added gradio-in-r (#9340)
Ifeanyi55 Sep 13, 2024
2e034c6
Enhance Lite E2E tests and fix a networking problem on Lite (#9333)
whitphx Sep 13, 2024
4ddb5db
Do not attach `content_disposition_type = "attachment"` headers for f…
abidlabs Sep 13, 2024
d04ab18
Fix overflowing markdown in Chatbot (#9260)
hannahblair Sep 13, 2024
bae18df
demo name
freddyaboulton Sep 16, 2024
ee3a05a
Merge branch '5.0-dev' into object-detection-guide
freddyaboulton Sep 16, 2024
500f4e7
Guide on Streaming Video for Object Detection (#9365)
freddyaboulton Sep 18, 2024
029e310
Small tweak to how thoughts are shown in `gr.Chatbot` (#9359)
abidlabs Sep 16, 2024
dc05f53
Use `container` param in `gr.Markdown` (#9356)
hannahblair Sep 16, 2024
1cc71c3
small fixes (#9347)
julien-c Sep 13, 2024
7c5d26e
Updated Guide: Real Time Speech Recognition (#9349)
Nik-Kras Sep 16, 2024
b9e5b3e
chunk space uploads (#9360)
pngwn Sep 17, 2024
4d41c80
add find (#9368)
aliabd Sep 17, 2024
bdc9e95
New branch (#9369)
aliabd Sep 17, 2024
74eba65
New branch (#9370)
aliabd Sep 17, 2024
9dc7bb6
run format
hannahblair Sep 17, 2024
ee0ae3c
Testing CI (#9379)
aliabd Sep 18, 2024
69b5fdc
Fixes website build in 5.0-dev (#9382)
aliabd Sep 18, 2024
633e75c
Small tweaks to improve the DX for the "tuples"/"messages" argument i…
abidlabs Sep 18, 2024
7a725c4
Update babylon.js to `v7` for `gr.Model3D` (#9377)
abidlabs Sep 18, 2024
498996e
Fix `gr.ImageEditor` toolbar cutoff (#9371)
hannahblair Sep 18, 2024
deef3b7
add lite upload (#9385)
aliabd Sep 18, 2024
f4b335c
fix sha (#9386)
aliabd Sep 18, 2024
9d017ae
Fix lite ci (#9387)
aliabd Sep 18, 2024
788f5cb
Add code
freddyaboulton Sep 18, 2024
3b9019b
feedback
freddyaboulton Sep 18, 2024
f69b7fe
merge latest
freddyaboulton Sep 18, 2024
5552aca
link
freddyaboulton Sep 18, 2024
96fc032
add changeset
gradio-pr-bot Sep 18, 2024
a7fc03d
code
freddyaboulton Sep 18, 2024
43fe4df
check
freddyaboulton Sep 18, 2024
e7ce4c5
Update guides/04_additional-features/02_streaming-outputs.md
abidlabs Sep 19, 2024
0fd2b0d
Update guides/07_streaming/02_object-detection-from-webcam.md
abidlabs Sep 19, 2024
dd97fb5
Merge branch '5.0-dev' into object-detection-guide
abidlabs Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions demo/yolov10_webcam_stream/run.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: yolov10_webcam_stream"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio "]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["import gradio as gr\n", "\n", "from ultralytics import YOLOv10\n", "\n", "model = YOLOv10.from_pretrained(\"jameslahm/yolov10n\")\n", "\n", "\n", "def yolov10_inference(image, conf_threshold):\n", " width, _ = image.size\n", " import time\n", "\n", " start = time.time()\n", " results = model.predict(source=image, imgsz=width, conf=conf_threshold)\n", " end = time.time()\n", " annotated_image = results[0].plot()\n", " print(\"time\", end - start)\n", " return annotated_image[:, :, ::-1]\n", "\n", "\n", "css = \"\"\".my-group {max-width: 600px !important; max-height: 600 !important;}\n", " .my-column {display: flex !important; justify-content: center !important; align-items: center !important};\"\"\"\n", "\n", "\n", "with gr.Blocks(css=css) as app:\n", " gr.HTML(\n", " \"\"\"\n", " <h1 style='text-align: center'>\n", " YOLOv10 Webcam Stream\n", " </h1>\n", " \"\"\"\n", " )\n", " gr.HTML(\n", " \"\"\"\n", " <h3 style='text-align: center'>\n", " <a href='https://arxiv.org/abs/2405.14458' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yolov10' target='_blank'>github</a>\n", " </h3>\n", " \"\"\"\n", " )\n", " with gr.Column(elem_classes=[\"my-column\"]):\n", " with gr.Group(elem_classes=[\"my-group\"]):\n", " image = gr.Image(type=\"pil\", label=\"Image\", sources=\"webcam\")\n", " conf_threshold = gr.Slider(\n", " label=\"Confidence Threshold\",\n", " minimum=0.0,\n", " maximum=1.0,\n", " step=0.05,\n", " value=0.30,\n", " )\n", " image.stream(\n", " fn=yolov10_inference,\n", " inputs=[image, conf_threshold],\n", " outputs=[image],\n", " stream_every=0.1,\n", " time_limit=30,\n", " )\n", "\n", "if __name__ == \"__main__\":\n", " app.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5}
58 changes: 58 additions & 0 deletions demo/yolov10_webcam_stream/run.py
abidlabs marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import gradio as gr

from ultralytics import YOLOv10

model = YOLOv10.from_pretrained("jameslahm/yolov10n")


def yolov10_inference(image, conf_threshold):
width, _ = image.size
import time

start = time.time()
results = model.predict(source=image, imgsz=width, conf=conf_threshold)
end = time.time()
annotated_image = results[0].plot()
print("time", end - start)
abidlabs marked this conversation as resolved.
Show resolved Hide resolved
return annotated_image[:, :, ::-1]


css = """.my-group {max-width: 600px !important; max-height: 600 !important;}
.my-column {display: flex !important; justify-content: center !important; align-items: center !important};"""


with gr.Blocks(css=css) as app:
gr.HTML(
"""
<h1 style='text-align: center'>
YOLOv10 Webcam Stream
</h1>
"""
)
gr.HTML(
"""
<h3 style='text-align: center'>
<a href='https://arxiv.org/abs/2405.14458' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yolov10' target='_blank'>github</a>
</h3>
"""
)
abidlabs marked this conversation as resolved.
Show resolved Hide resolved
with gr.Column(elem_classes=["my-column"]):
with gr.Group(elem_classes=["my-group"]):
image = gr.Image(type="pil", label="Image", sources="webcam")
conf_threshold = gr.Slider(
label="Confidence Threshold",
minimum=0.0,
maximum=1.0,
step=0.05,
value=0.30,
)
image.stream(
fn=yolov10_inference,
inputs=[image, conf_threshold],
outputs=[image],
stream_every=0.1,
time_limit=30,
)

if __name__ == "__main__":
app.launch()
93 changes: 93 additions & 0 deletions guides/07_streaming/02_object-detection-from-webcam.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Object Detection from a Webcam Stream
abidlabs marked this conversation as resolved.
Show resolved Hide resolved

Tags: VISION, STREAMING, WEBCAM

In this guide we'll use Yolo-v10 to do near-real time object detection in Gradio from a user's webcam.
Along the way, we'll be using the latest streaming features introduced in Gradio 5.0.

## Setting up the Model

First, we'll follow the installation instructions for [Yolov10n](https://huggingface.co/jameslahm/yolov10n) on the Hugging Face hub.

Run `pip install git+https://github.com/THU-MIG/yolov10.git` in your virtual environment.

Then, we'll download the model from the Hub (`ultralytics` is the library we've just installed).

```python
from ultralytics import YOLOv10

model = YOLOv10.from_pretrained('jameslahm/yolov10n')
```

We are using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the github repository.


## The Inference Function

Our inference function will accept a PIL image from the webcam as well as a desired conference threshold.
Object detection models like YOLO identify many objects and assign a confidence score to each object. The lower the confidence, the higher the chance of a false positive. So we will let our users play with the conference threshold.

```python
def yolov10_inference(image, conf_threshold):
width, _ = image.size
results = model.predict(source=image, imgsz=width, conf=conf_threshold)
annotated_image = results[0].plot()
return annotated_image[:, :, ::-1]
```

We will use the `plot` method to draw a bounding box around each detected object. YoloV10 asses images are in the BGR color format, so we will flip them to be in the expected RGB format of web browsers.

## The Gradio Demo

The Gradio demo will be pretty straightforward but we'll do a couple of things that are specific to streaming:

* The user's webcam will be both an input and an output. That way, the user will only see their stream with the detected objects.
* We'll use the `time_limit` and `stream_every` parameters of the `stream` event. The `time_limit` parameter will mean that we'll process each user's stream for that amount of time. The `stream_every` function will control how frequently the webcam stream is sent to the server.
abidlabs marked this conversation as resolved.
Show resolved Hide resolved

In addition, we'll apply some custom css so that the webcam and slider are centered on the page.

```python
css=""".my-group {max-width: 600px !important; max-height: 600 !important;}
.my-column {display: flex !important; justify-content: center !important; align-items: center !important};"""


with gr.Blocks(css=css) as app:
gr.HTML(
"""
<h1 style='text-align: center'>
YOLOv10 Webcam Stream
</h1>
""")
gr.HTML(
"""
<h3 style='text-align: center'>
<a href='https://arxiv.org/abs/2405.14458' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yolov10' target='_blank'>github</a>
</h3>
""")
with gr.Column(elem_classes=["my-column"]):
with gr.Group(elem_classes=["my-group"]):
image = gr.Image(type="pil", label="Image", sources="webcam")
conf_threshold = gr.Slider(
label="Confidence Threshold",
minimum=0.0,
maximum=1.0,
step=0.05,
value=0.30,
)
image.stream(
fn=yolov10_inference,
inputs=[image, conf_threshold],
outputs=[image],
stream_every=0.1,
time_limit=30
)
```


## Conclusion

You can check out our demo hosted on Hugging Face Spaces [here](https://huggingface.co/spaces/gradio/YOLOv10-webcam-stream).

It is also embedded on this page below

$demo_YOLOv10-webcam-stream
abidlabs marked this conversation as resolved.
Show resolved Hide resolved
Loading