aimagelab / PMA-NetLinks

[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.

☆19

Alternatives and similar repositories for PMA-Net

Users that are interested in PMA-Net are comparing it to the libraries listed below

Sorting:

joeyz0z / MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆53Updated last year
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆63Updated last year
XLiu443 / Tem-adapter
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
☆37Updated 2 years ago
boreng0817 / IFCap
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15Updated 6 months ago
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 4 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago
JacobYuan7 / RLIP
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…
☆78Updated last year
AlonMendelson / SGVL
☆16Updated last year
Tanveer81 / RGNet
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆17Updated 9 months ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
showlab / GEB-Plus
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
☆16Updated 3 years ago
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆60Updated last year
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆91Updated last year
allenai / reclip
☆87Updated 3 years ago
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆54Updated last year
TalalWasim / Vita-CLIP
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
☆127Updated 2 years ago
visinf / veto
Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)
☆22Updated 2 years ago
bladewaltz1 / PromptSwitch
☆30Updated 2 years ago
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Updated 2 years ago
sail-sg / ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆152Updated 2 years ago
zjucsq / PLA
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Updated 2 years ago
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆53Updated 8 months ago
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
Yuqifan1117 / CaCao
This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…
☆48Updated last year
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆75Updated 2 years ago
jy0205 / STCAT
[NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
☆53Updated last year
thunlp / PEVL
Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”
☆48Updated 3 years ago
kingthreestones / RefCLIP
☆38Updated 2 years ago