[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
language
computer-vision
vision
clip
image-retrieval
fine-grained
robustness
text-retrieval
multimodal
compositionality
vision-language
vision-language-model
cvpr2024
compostional
-
Updated
Nov 12, 2024 - Jupyter Notebook