-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Object Detection From Webcam Stream Guide (#9336)
* guides * Add demo * guide * Add info about Powershell client (#9343) * clients * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * Remove lite/theme.css from the Git-managed file tree (#9335) * Delete js/lite/src/theme.css from the Git managed file tree as it's dynamically generated * Remove lite-related npm scripts from spa/package.json * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * 9227 chatinterface retry bug (#9316) * first draft * add code * tip * add changeset * delete dead code * Type check notebook * consolidate like section with guide * Add comments * add value * Lint * lint * guide --------- Co-authored-by: gradio-pr-bot <[email protected]> Co-authored-by: Abubakar Abid <[email protected]> * Move icons into `IconButtonWrapper` (#9261) * * update icon buttons * add image editor specific icon button * tweak hover * margin tweak * add changeset * improve gr.Video button UI * radius tweak * ensure even spacing * fix typechecks * add changeset * revert irrelevant changes * typefix * fix image editor buttons * fix download link icon * disable undo if no change events dispatched in model3d and video * use icons with iconbuttonwrapper * add iconbuttonwrapper around gallery share btn * Revert "add iconbuttonwrapper around gallery share btn" This reverts commit 4605302. * add changeset * design fixes * add changeset * move status tracker progress to bottom of component * add changeset * use iconbutton for like/dislike * fix lint error * fix type errors * type errors * fix test * revert undo icon change * btn spacing --------- Co-authored-by: gradio-pr-bot <[email protected]> * Added gradio-in-r (#9340) * Added gradio-in-r * add changeset * section * remove * tweaks * delete changeset * R * Updated using-gradio-in-other-programming-languages.md --------- Co-authored-by: Abubakar Abid <[email protected]> Co-authored-by: gradio-pr-bot <[email protected]> * Enhance Lite E2E tests and fix a networking problem on Lite (#9333) * Add Lite E2E test to check a matplotlib problem which was fixed in #9312 * Restore js/app/test/image_remote_url.spec.ts, which was deleted in #8716 * Fix tootils import * Format * Fix processing_utils.resolve_with_google_dns to use the HTTPX client instead of urllib so it works on Lite * add changeset * add changeset * Move js/app/test/image_remote_url.spec.ts -> js/spa/test/image_remote_url.spec.ts * Use pyodide.http in resolve_with_google_dns on Lite --------- Co-authored-by: gradio-pr-bot <[email protected]> * Do not attach `content_disposition_type = "attachment"` headers for files explicitly allowed by developer (#9348) * changes * add changeset * format * fix type * type * add test --------- Co-authored-by: gradio-pr-bot <[email protected]> * Fix overflowing markdown in Chatbot (#9260) * fix markdown overflowing table * add changeset * revert undo icon * add changeset * Revert "revert undo icon" This reverts commit 855b012. * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * demo name * Guide on Streaming Video for Object Detection (#9365) * Add code * notebooks * Suggestions * Add gif * Small tweak to how thoughts are shown in `gr.Chatbot` (#9359) * thiknk chat * add changeset * lint --------- Co-authored-by: gradio-pr-bot <[email protected]> * Use `container` param in `gr.Markdown` (#9356) * * add param * add story * add changeset * Use IconButton for copy btn * fix test --------- Co-authored-by: gradio-pr-bot <[email protected]> * small fixes (#9347) * Updated Guide: Real Time Speech Recognition (#9349) * Update real-time-speech-recognition.md added necessary dependency * Update run.py updated code to handle cases with stereo microphone * Update real-time-speech-recognition.md improved english * Update run.py updated code for streaming * Update run.py * chunk space uploads (#9360) * chunk space uploads * Update upload_demo_to_space.py Co-authored-by: Lucain <[email protected]> * address comments + tweak CI --------- Co-authored-by: Lucain <[email protected]> * add find (#9368) * New branch (#9369) * add find * fix syntax * New branch (#9370) * add find * fix syntax * add hidden files * run format * Testing CI (#9379) * remove unnecessary redirects * add changeset * fix * formatting --------- Co-authored-by: gradio-pr-bot <[email protected]> * Fixes website build in 5.0-dev (#9382) * changes * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * Small tweaks to improve the DX for the "tuples"/"messages" argument in `gr.Chatbot` (#9358) * change format * format * add changeset * revert * revert --------- Co-authored-by: gradio-pr-bot <[email protected]> * Update babylon.js to `v7` for `gr.Model3D` (#9377) * update package.json * add changeset * add changeset * update pnpm lock * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * Fix `gr.ImageEditor` toolbar cutoff (#9371) * fix wrap alignment * add changeset --------- Co-authored-by: gradio-pr-bot <[email protected]> * add lite upload (#9385) * fix sha (#9386) * Fix lite ci (#9387) * fix sha * fix name * fix name * Add code * feedback * link * add changeset * code * check * Update guides/04_additional-features/02_streaming-outputs.md * Update guides/07_streaming/02_object-detection-from-webcam.md --------- Co-authored-by: Abubakar Abid <[email protected]> Co-authored-by: gradio-pr-bot <[email protected]> Co-authored-by: Yuichiro Tachibana (Tsuchiya) <[email protected]> Co-authored-by: Hannah <[email protected]> Co-authored-by: Ifeanyi Idiaye <[email protected]> Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Nikita Krasnytskyi <[email protected]> Co-authored-by: pngwn <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Ali Abdalla <[email protected]>
- Loading branch information
1 parent
3ad28c7
commit 736046f
Showing
57 changed files
with
511 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
"gradio": minor | ||
--- | ||
|
||
feat:Object Detection From Webcam Stream Guide |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
from PIL import ImageDraw, ImageFont # type: ignore | ||
import colorsys | ||
|
||
|
||
def get_color(label): | ||
# Simple hash function to generate consistent colors for each label | ||
hash_value = hash(label) | ||
hue = (hash_value % 100) / 100.0 | ||
saturation = 0.7 | ||
value = 0.9 | ||
rgb = colorsys.hsv_to_rgb(hue, saturation, value) | ||
return tuple(int(x * 255) for x in rgb) | ||
|
||
|
||
def draw_bounding_boxes(image, results: dict, model, threshold=0.3): | ||
draw = ImageDraw.Draw(image) | ||
font = ImageFont.load_default() | ||
|
||
for score, label_id, box in zip( | ||
results["scores"], results["labels"], results["boxes"] | ||
): | ||
if score > threshold: | ||
label = model.config.id2label[label_id.item()] | ||
box = [round(i, 2) for i in box.tolist()] | ||
color = get_color(label) | ||
|
||
# Draw bounding box | ||
draw.rectangle(box, outline=color, width=3) # type: ignore | ||
|
||
# Prepare text | ||
text = f"{label}: {score:.2f}" | ||
text_bbox = draw.textbbox((0, 0), text, font=font) | ||
text_width = text_bbox[2] - text_bbox[0] | ||
text_height = text_bbox[3] - text_bbox[1] | ||
|
||
# Draw text background | ||
draw.rectangle( | ||
[box[0], box[1] - text_height - 4, box[0] + text_width, box[1]], # type: ignore | ||
fill=color, # type: ignore | ||
) | ||
|
||
# Draw text | ||
draw.text((box[0], box[1] - text_height - 4), text, fill="white", font=font) | ||
|
||
return image |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
safetensors==0.4.3 | ||
opencv-python | ||
torch | ||
transformers>=4.43.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: rt-detr-object-detection"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio safetensors==0.4.3 opencv-python torch transformers>=4.43.0"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["# Downloading files from the demo repo\n", "import os\n", "!wget -q https://github.com/gradio-app/gradio/raw/main/demo/rt-detr-object-detection/draw_boxes.py"]}, {"cell_type": "code", "execution_count": null, "id": "44380577570523278879349135829904343037", "metadata": {}, "outputs": [], "source": ["import spaces\n", "import gradio as gr\n", "import cv2\n", "from PIL import Image\n", "import torch\n", "import time\n", "import numpy as np\n", "import uuid\n", "\n", "from transformers import RTDetrForObjectDetection, RTDetrImageProcessor # type: ignore\n", "\n", "from draw_boxes import draw_bounding_boxes\n", "\n", "image_processor = RTDetrImageProcessor.from_pretrained(\"PekingU/rtdetr_r50vd\")\n", "model = RTDetrForObjectDetection.from_pretrained(\"PekingU/rtdetr_r50vd\").to(\"cuda\")\n", "\n", "\n", "SUBSAMPLE = 2\n", "\n", "\n", "@spaces.GPU\n", "def stream_object_detection(video, conf_threshold):\n", " cap = cv2.VideoCapture(video)\n", "\n", " video_codec = cv2.VideoWriter_fourcc(*\"mp4v\") # type: ignore\n", " fps = int(cap.get(cv2.CAP_PROP_FPS))\n", "\n", " desired_fps = fps // SUBSAMPLE\n", " width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) // 2\n", " height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) // 2\n", "\n", " iterating, frame = cap.read()\n", "\n", " n_frames = 0\n", "\n", " name = f\"output_{uuid.uuid4()}.mp4\"\n", " segment_file = cv2.VideoWriter(name, video_codec, desired_fps, (width, height)) # type: ignore\n", " batch = []\n", "\n", " while iterating:\n", " frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5)\n", " frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n", " if n_frames % SUBSAMPLE == 0:\n", " batch.append(frame)\n", " if len(batch) == 2 * desired_fps:\n", " inputs = image_processor(images=batch, return_tensors=\"pt\").to(\"cuda\")\n", "\n", " print(f\"starting batch of size {len(batch)}\")\n", " start = time.time()\n", " with torch.no_grad():\n", " outputs = model(**inputs)\n", " end = time.time()\n", " print(\"time taken for inference\", end - start)\n", "\n", " start = time.time()\n", " boxes = image_processor.post_process_object_detection(\n", " outputs,\n", " target_sizes=torch.tensor([(height, width)] * len(batch)),\n", " threshold=conf_threshold,\n", " )\n", "\n", " for _, (array, box) in enumerate(zip(batch, boxes)):\n", " pil_image = draw_bounding_boxes(\n", " Image.fromarray(array), box, model, conf_threshold\n", " )\n", " frame = np.array(pil_image)\n", " # Convert RGB to BGR\n", " frame = frame[:, :, ::-1].copy()\n", " segment_file.write(frame)\n", "\n", " batch = []\n", " segment_file.release()\n", " yield name\n", " end = time.time()\n", " print(\"time taken for processing boxes\", end - start)\n", " name = f\"output_{uuid.uuid4()}.mp4\"\n", " segment_file = cv2.VideoWriter(\n", " name, video_codec, desired_fps, (width, height)\n", " ) # type: ignore\n", "\n", " iterating, frame = cap.read()\n", " n_frames += 1\n", "\n", "\n", "with gr.Blocks() as demo:\n", " gr.HTML(\n", " \"\"\"\n", " <h1 style='text-align: center'>\n", " Video Object Detection with <a href='https://huggingface.co/PekingU/rtdetr_r101vd_coco_o365' target='_blank'>RT-DETR</a>\n", " </h1>\n", " \"\"\"\n", " )\n", " with gr.Row():\n", " with gr.Column():\n", " video = gr.Video(label=\"Video Source\")\n", " conf_threshold = gr.Slider(\n", " label=\"Confidence Threshold\",\n", " minimum=0.0,\n", " maximum=1.0,\n", " step=0.05,\n", " value=0.30,\n", " )\n", " with gr.Column():\n", " output_video = gr.Video(\n", " label=\"Processed Video\", streaming=True, autoplay=True\n", " )\n", "\n", " video.upload(\n", " fn=stream_object_detection,\n", " inputs=[video, conf_threshold],\n", " outputs=[output_video],\n", " )\n", "\n", "if __name__ == \"__main__\":\n", " demo.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
import spaces | ||
import gradio as gr | ||
import cv2 | ||
from PIL import Image | ||
import torch | ||
import time | ||
import numpy as np | ||
import uuid | ||
|
||
from transformers import RTDetrForObjectDetection, RTDetrImageProcessor # type: ignore | ||
|
||
from draw_boxes import draw_bounding_boxes | ||
|
||
image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_r50vd") | ||
model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd").to("cuda") | ||
|
||
|
||
SUBSAMPLE = 2 | ||
|
||
|
||
@spaces.GPU | ||
def stream_object_detection(video, conf_threshold): | ||
cap = cv2.VideoCapture(video) | ||
|
||
video_codec = cv2.VideoWriter_fourcc(*"mp4v") # type: ignore | ||
fps = int(cap.get(cv2.CAP_PROP_FPS)) | ||
|
||
desired_fps = fps // SUBSAMPLE | ||
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) // 2 | ||
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) // 2 | ||
|
||
iterating, frame = cap.read() | ||
|
||
n_frames = 0 | ||
|
||
name = f"output_{uuid.uuid4()}.mp4" | ||
segment_file = cv2.VideoWriter(name, video_codec, desired_fps, (width, height)) # type: ignore | ||
batch = [] | ||
|
||
while iterating: | ||
frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5) | ||
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) | ||
if n_frames % SUBSAMPLE == 0: | ||
batch.append(frame) | ||
if len(batch) == 2 * desired_fps: | ||
inputs = image_processor(images=batch, return_tensors="pt").to("cuda") | ||
|
||
print(f"starting batch of size {len(batch)}") | ||
start = time.time() | ||
with torch.no_grad(): | ||
outputs = model(**inputs) | ||
end = time.time() | ||
print("time taken for inference", end - start) | ||
|
||
start = time.time() | ||
boxes = image_processor.post_process_object_detection( | ||
outputs, | ||
target_sizes=torch.tensor([(height, width)] * len(batch)), | ||
threshold=conf_threshold, | ||
) | ||
|
||
for _, (array, box) in enumerate(zip(batch, boxes)): | ||
pil_image = draw_bounding_boxes( | ||
Image.fromarray(array), box, model, conf_threshold | ||
) | ||
frame = np.array(pil_image) | ||
# Convert RGB to BGR | ||
frame = frame[:, :, ::-1].copy() | ||
segment_file.write(frame) | ||
|
||
batch = [] | ||
segment_file.release() | ||
yield name | ||
end = time.time() | ||
print("time taken for processing boxes", end - start) | ||
name = f"output_{uuid.uuid4()}.mp4" | ||
segment_file = cv2.VideoWriter( | ||
name, video_codec, desired_fps, (width, height) | ||
) # type: ignore | ||
|
||
iterating, frame = cap.read() | ||
n_frames += 1 | ||
|
||
|
||
with gr.Blocks() as demo: | ||
gr.HTML( | ||
""" | ||
<h1 style='text-align: center'> | ||
Video Object Detection with <a href='https://huggingface.co/PekingU/rtdetr_r101vd_coco_o365' target='_blank'>RT-DETR</a> | ||
</h1> | ||
""" | ||
) | ||
with gr.Row(): | ||
with gr.Column(): | ||
video = gr.Video(label="Video Source") | ||
conf_threshold = gr.Slider( | ||
label="Confidence Threshold", | ||
minimum=0.0, | ||
maximum=1.0, | ||
step=0.05, | ||
value=0.30, | ||
) | ||
with gr.Column(): | ||
output_video = gr.Video( | ||
label="Processed Video", streaming=True, autoplay=True | ||
) | ||
|
||
video.upload( | ||
fn=stream_object_detection, | ||
inputs=[video, conf_threshold], | ||
outputs=[output_video], | ||
) | ||
|
||
if __name__ == "__main__": | ||
demo.launch() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
safetensors==0.4.3 | ||
git+https://github.com/THU-MIG/yolov10.git |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: yolov10_webcam_stream"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio safetensors==0.4.3 git+https://github.com/THU-MIG/yolov10.git"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["import gradio as gr\n", "\n", "from ultralytics import YOLOv10\n", "\n", "model = YOLOv10.from_pretrained(\"jameslahm/yolov10n\")\n", "\n", "\n", "def yolov10_inference(image, conf_threshold):\n", " width, _ = image.size\n", " import time\n", "\n", " start = time.time()\n", " results = model.predict(source=image, imgsz=width, conf=conf_threshold)\n", " end = time.time()\n", " annotated_image = results[0].plot()\n", " print(\"time\", end - start)\n", " return annotated_image[:, :, ::-1]\n", "\n", "\n", "css = \"\"\".my-group {max-width: 600px !important; max-height: 600 !important;}\n", " .my-column {display: flex !important; justify-content: center !important; align-items: center !important};\"\"\"\n", "\n", "\n", "with gr.Blocks(css=css) as app:\n", " gr.HTML(\n", " \"\"\"\n", " <h1 style='text-align: center'>\n", " <a href='https://github.com/THU-MIG/yolov10' target='_blank'>YOLO V10</a> Webcam Stream Object Detection\n", " </h1>\n", " \"\"\"\n", " )\n", " with gr.Column(elem_classes=[\"my-column\"]):\n", " with gr.Group(elem_classes=[\"my-group\"]):\n", " image = gr.Image(type=\"pil\", label=\"Image\", sources=\"webcam\")\n", " conf_threshold = gr.Slider(\n", " label=\"Confidence Threshold\",\n", " minimum=0.0,\n", " maximum=1.0,\n", " step=0.05,\n", " value=0.30,\n", " )\n", " image.stream(\n", " fn=yolov10_inference,\n", " inputs=[image, conf_threshold],\n", " outputs=[image],\n", " stream_every=0.1,\n", " time_limit=30,\n", " )\n", "\n", "if __name__ == \"__main__\":\n", " app.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import gradio as gr | ||
|
||
from ultralytics import YOLOv10 | ||
|
||
model = YOLOv10.from_pretrained("jameslahm/yolov10n") | ||
|
||
|
||
def yolov10_inference(image, conf_threshold): | ||
width, _ = image.size | ||
import time | ||
|
||
start = time.time() | ||
results = model.predict(source=image, imgsz=width, conf=conf_threshold) | ||
end = time.time() | ||
annotated_image = results[0].plot() | ||
print("time", end - start) | ||
return annotated_image[:, :, ::-1] | ||
|
||
|
||
css = """.my-group {max-width: 600px !important; max-height: 600 !important;} | ||
.my-column {display: flex !important; justify-content: center !important; align-items: center !important};""" | ||
|
||
|
||
with gr.Blocks(css=css) as app: | ||
gr.HTML( | ||
""" | ||
<h1 style='text-align: center'> | ||
<a href='https://github.com/THU-MIG/yolov10' target='_blank'>YOLO V10</a> Webcam Stream Object Detection | ||
</h1> | ||
""" | ||
) | ||
with gr.Column(elem_classes=["my-column"]): | ||
with gr.Group(elem_classes=["my-group"]): | ||
image = gr.Image(type="pil", label="Image", sources="webcam") | ||
conf_threshold = gr.Slider( | ||
label="Confidence Threshold", | ||
minimum=0.0, | ||
maximum=1.0, | ||
step=0.05, | ||
value=0.30, | ||
) | ||
image.stream( | ||
fn=yolov10_inference, | ||
inputs=[image, conf_threshold], | ||
outputs=[image], | ||
stream_every=0.1, | ||
time_limit=30, | ||
) | ||
|
||
if __name__ == "__main__": | ||
app.launch() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.