visual-language-learning

Here are 13 public repositories matching this topic...

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

Luodian / Otter

Star

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

machine-learning deep-learning multi-modality artificial-inteligence embodied-ai gpt-4 foundation-models large-scale-models visual-language-learning chatgpt instruction-tuning

Updated Mar 5, 2024
Python

NExT-GPT / NExT-GPT

Star

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

multimodal gpt-4 foundation-models visual-language-learning large-language-models llm chatgpt instruction-tuning multi-modal-chatgpt

Updated Nov 3, 2024
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Oct 10, 2024
Python

xiaoachen98 / Open-LLaVA-NeXT

Star

An open-source implementation for training LLaVA-NeXT.

chatbot llama multimodal multi-modality gpt-4 visual-language-learning chatgpt vision-language-model llava large-multimodal-models llama3 gpt4o llava-next

Updated Oct 23, 2024
Python

mlpc-ucsd / BLIVA

Star

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

chatbot llama lora multimodal visual-language-learning llm instruction-tuning blip2 bliva

Updated Apr 14, 2024
Python

RLHF-V / RLHF-V

Star

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

chatbot llama multimodal multi-modality gpt-4 visual-language-learning rlhf-v

Updated Sep 11, 2024
Python

thomas-yanxin / KarmaVLM

Star

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

vlm multimodel visual-language-learning vision-language-model llava llama2 qwen2

Updated Apr 29, 2024
Python

AdrianBZG / llama-multimodal-vqa

Star

Multimodal Instruction Tuning for Llama 3

chatbot vqa llama language-models visual-question-answering multimodal huggingface gpt-4 visual-language-learning chatgpt instruction-tuning multimodal-instruction-tuning llama2 llama3

Updated Apr 25, 2024
Python

xinyanghuang7 / Basic-Visual-Language-Model

Star

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

visual-language-learning large-language-models visual-language-models multimodel-large-language-model

Updated Jun 19, 2024
Python

Skyline-9 / Shotluck-Holmes

Star

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

python nlp video-summarization video-captioning multi-modality visual-language-learning llm vision-language-model shotluck-holmes

Updated Oct 26, 2024
Python

MuhammadAliS / CLIP

Star

PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).

deep-neural-networks transformers visual-question-answering pytorch-implementation huggingface visual-language-learning

Updated Sep 14, 2024
Jupyter Notebook

ecoxial2007 / EffVideoQA

Star

Efficient Video Question Answering

computer-vision video-question-answering visual-language-learning

Updated Jan 19, 2023
Python

Improve this page

Add a description, image, and links to the visual-language-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the visual-language-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

visual-language-learning

Here are 13 public repositories matching this topic...

haotian-liu / LLaVA

Luodian / Otter

NExT-GPT / NExT-GPT

InternLM / InternLM-XComposer

xiaoachen98 / Open-LLaVA-NeXT

mlpc-ucsd / BLIVA

RLHF-V / RLHF-V

thomas-yanxin / KarmaVLM

AdrianBZG / llama-multimodal-vqa

xinyanghuang7 / Basic-Visual-Language-Model

Skyline-9 / Shotluck-Holmes

MuhammadAliS / CLIP

ecoxial2007 / EffVideoQA

Improve this page

Add this topic to your repo