Add multi Modal LLM Support #16782
Replies: 4 comments 2 replies
-
Hi, there are several add-ons that can do this. A key issue is that these services may require (and some do require) Internet access unless compact local models can be used at some point (provided that hardware can deal with it, made easier thanks to neural processing units and other AI accelerators). Thanks.
|
Beta Was this translation helpful? Give feedback.
-
See this add-on for more details: |
Beta Was this translation helpful? Give feedback.
-
Personally, I dream an assistant that understands all sighted instructions like "look at the top/bottom/left/right" on web sites and applications, something that makes me crazy each time, and perhaps provides an interaction over images (like clicking a point over a map). But these situations is often related to personal context, so any cloud service is a privacy hole unfortunately. And models running over a basic PC are not so able for what I know, at the moment. |
Beta Was this translation helpful? Give feedback.
-
Yes but an assistant does not have to be an AI model actually. It could be a combination of NVDA and voice access or a text alternative of it in Windows 11. No cloud needed. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Considering the recent developments in LLMs, especially in the multimodal realm, how about integrating a function that provides a summary or an overview of the visible image or a specific section of it with just a keypress?
Users could input their own API key (e.g., GPT-4, Gemini Pro, or Claude 3.5), and from then on, they can use the function.
Beta Was this translation helpful? Give feedback.
All reactions