joeyz0z / MeaCapLinks

(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning

☆53

Alternatives and similar repositories for MeaCap

Users that are interested in MeaCap are comparing it to the libraries listed below

Sorting:

ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆64Updated last year
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆54Updated last year
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆60Updated last year
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆92Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
aimagelab / PMA-Net
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Updated last year
bladewaltz1 / PromptSwitch
☆30Updated 2 years ago
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
jpthu17 / HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
☆123Updated last year
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆64Updated last year
huangmozhi9527 / GMMFormer
[AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
☆20Updated last year
TalalWasim / Vita-CLIP
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
☆127Updated 2 years ago
leolee99 / PAU
The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…
☆27Updated last year
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
Show-han / Zeroshot_REC
Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)
☆27Updated last year
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Updated last year
icq-benchmark / icq-benchmark
☆20Updated 5 months ago
hrtang22 / MUSE
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆26Updated 10 months ago
ThomasWangY / 2024-AAAI-HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
☆73Updated 10 months ago
jy0205 / STCAT
[NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
☆53Updated last year
yellow-binary-tree / HawkEye
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆45Updated last year
Pter61 / context-i2w
Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]
☆55Updated 7 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 8 months ago
boreng0817 / IFCap
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15Updated 7 months ago
mbzuai-oryx / CVRR-Evaluation-Suite
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆50Updated last year
HJYao00 / Side4Video
☆42Updated last year
Cuberick-Orion / Bi-Blip4CIR
The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…
☆33Updated last year
kingthreestones / RefCLIP
☆39Updated 2 years ago
ruc-aimc-lab / TeachCLIP
[CVPR 2024] TeachCLIP for Text-to-Video Retrieval
☆42Updated 7 months ago