Skip to content

zibojia/Cross_Modal_Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 

Repository files navigation

Cross_Modal_Retrieval

Image Dataset Num Video Dataset Num
COCO 123K Flickr 31K
SBU IM VG 108K
CC3M 3.3M CUB 11K
Multi30K 151K CC14M 14M
Fashion-Gen 293K XTD 10K
Amazon reviews 14M
Video Dataset Num Video Dataset Num
MSRVTT 10K MSVD 2K
LSMDC 118K DIDEMO 8K
YouCook2 2K YFCC 100M
CrossTask 4K HowTo100M 100M
VaTeX 41K Mining Youtube 20K
WebVid2M 2M QueryYD 1K
VGGSound 200K LiveBot 2K
Kinetics-700 650K FCVID 91K
ActivityNet 20K

2022

Task Paper Dataset Pretraining Year
Image Retrieval Multi-Lingual Acquisition on Multimodal Pre-training
for Cross-modal Retrieval [link]
CC3M, Multi30K, COCO, XTD, MSRVTT Yes 2022
Video Retrieval Text-Adaptive Multiple Visual Prototype Matching for
Video-Text Retrieval[link]
MSRVTT, MSVD, DIDEMO, LSMDC No 2022
Video Retrieval Everything at Once – Multi-modal Fusion Transformer for Video Retrieval [link] HowTo100M, Web, CrossTask, Mining YouTube Yes 2022
Video Retrieval Cross Modal Retrieval with Querybank Normalisation [link] MSRVTT, LSMDC, MSVD, VaTeX, QueryYD, DiDeMo No 2022
Image Retrieval A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval [link] COCO, CUB, Flickr No 2022
Image Retrieval EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval [link] Fashion-Gen, Amazon reviews Yes 2022
Image Retrieval COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval [link] CC3M, CC14M, SBU, VG, COCO, Flickr Yes 2022
Image Retrieval ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval [link] VG, Flickr, TC, CTC No 2022
Video Retrieval X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval [link] MSVD, LSMDC, MSRVTT No 2022
Video Retrieval ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound [link] [WebVid2M, VGGSound] Yes 2022
Video Retrieval VTC: Improving Video-Text Retrieval with User Comments [link] LiveBot, Kinetics-700 Yes 2022
Video Retrieval Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval [link] FCVID, ActivityNet and YFCC No 2022
Video Retrieval MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval [link] CC3M, WebVid-2M, MSVD, LSMDC, DiDeMo Yes 2022
Video Retrieval Multi-Query Video Retrieval [link] MSR-VTT, MSVD, VATEX No 2022
Video Retrieval Selective Query-guided Debiasing for Video Corpus Moment Retrieval [link] TVR, ActivityNet, DiDeMo No 2022
Video Retrieval TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval [link] MSRVTT, VATEX, LSMDC, ActivityNet, DiDeMo No 2022
Video Retrieval Learning Linguistic Association Towards Efficient Text-Video Retrieval [link] MSRVTT, MSVD, VATEX No 2022
Image Retrieval CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval [link] COCO, Flickr No 2022
Image Retrieval Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval [link] COCO, Flickr, CC14K No 2021
Image Retrieval Learning with Noisy Correspondence for Cross-modal Matching [link] COCO, Flickr, CC152K No 2022
Image Retrieval Probabilistic Embeddings for Cross-Modal Retrieval [link] COCO, CUB No 2021
Video Retrieval TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [link] MSRVTT No 2021
Video Retrieval Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval [link] COCO, MSRVTT, MSVD, LSMDC No 2021
Image Retrieval Learning Cross-Modal Retrieval with Noisy Labels [link] Wikipedia, INRIA-Websearch, NUS-WIDE, XMediaNet Yes 2021
Image Retrieval Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [link] Recipe1M Yes 2021
Image Retrieval Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query [link] VG No 2021
Image Retrieval Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining [link] Product1M Yes 2021
Image Retrieval Wasserstein Coupled Graph Learning for Cross-Modal Retrieval [link] Fickr, COCO, Real World Scene Graph, Moviegraphs No 2021
Image Retrieval Deep Hash Distillation for Image Retrieval [link] ImageNet, NUS-WIDE, COCO Yes 2021
Video Retrieval Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval [link] MSRVTT, TGIF, MSVD, VATEX No 2021
Video Retrieval CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching [link] YouCook2, MSRVTT, HowTo100M, CrossTask, Mining Youtube Yes 2020
Image Retrieval IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [link] KWAI-AD, Flickr, COCO No 2020
Video Retrieval Multi-modal Transformer for Video Retrieval [link] [MSRVTT, ActivityNet, LSMDC] No 2020
Image Retrieval Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [link] Politics, GoodNews, CC, COCO Yes 2020
Image Retrieval Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval [link] Fashion200k Yes 2020

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published