Jiaxuan-Li/EVCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Jiaxuan-Li/EVCap)

Jiaxuan-Li / EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

☆64

Alternatives and similar repositories for EVCap

Users that are interested in EVCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

joeyz0z / MeaCap
View on GitHub
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆56Aug 16, 2024Updated last year
RitaRamo / smallcap
View on GitHub
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
☆125Feb 13, 2024Updated 2 years ago
FeiElysia / ViECap
View on GitHub
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
☆167Sep 9, 2024Updated last year
taewhankim / VIPCAP
View on GitHub
☆15Dec 31, 2024Updated last year
DingchenYang99 / Pensieve
View on GitHub
The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"
☆15May 4, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Saehyung-Lee / PlugIR
View on GitHub
Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)
☆34Mar 24, 2025Updated last year
hanghuacs / FineCaption
View on GitHub
☆39Jun 20, 2025Updated last year
GeWu-Lab / Patch-Matters
View on GitHub
[CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
☆25Jun 17, 2025Updated last year
junyangwang0410 / Knight
View on GitHub
SotA text-only image/video method (IJCAI 2023)
☆14Jan 9, 2024Updated 2 years ago
aimagelab / DiCO
View on GitHub
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
☆20Sep 11, 2024Updated last year
ghchen18 / acl23_mclip
View on GitHub
The official code and model for ACL 2023 paper 'mCLIP: Multilingual CLIP via Cross-lingual Transfer'
☆10Jan 23, 2024Updated 2 years ago
callsys / ControlCap
View on GitHub
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆81Oct 25, 2024Updated last year
Go2Heart / EchoSight
View on GitHub
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆90Jan 19, 2026Updated 6 months ago
Paranioar / RCAR
View on GitHub
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
☆34Apr 11, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
boreng0817 / IFCap
View on GitHub
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15May 13, 2025Updated last year
edchengg / infoseek_eval
View on GitHub
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆26May 30, 2024Updated 2 years ago
zhuang-li / FactualSceneGraph
View on GitHub
[ACL 2023 Findings] FACTUAL dataset, the textual scene graph parser trained on FACTUAL.
☆131Jun 15, 2026Updated last month
joeyz0z / ConZIC
View on GitHub
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆76Sep 20, 2023Updated 2 years ago
FeiElysia / awesome-zero-shot-captioning
View on GitHub
A curated list of zero-shot captioning papers
☆24Aug 26, 2023Updated 2 years ago
xfactlab / I0T
View on GitHub
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆12Jun 18, 2025Updated last year
duyngtr16061999 / KDMCSE
View on GitHub
☆10Apr 7, 2024Updated 2 years ago
mzhaoshuai / RLCF
View on GitHub
[ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.
☆102Oct 20, 2025Updated 9 months ago
xinwei666 / MMGenerativeIR
View on GitHub
Official Code of our AAAI-24 Paper: "Generative Multi-modal Knowledge Retrieval with Large Language Models".
☆28Sep 15, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Letian2003 / C-VQA
View on GitHub
Counterfactual Reasoning VQA Dataset
☆28Nov 23, 2023Updated 2 years ago
jmiemirza / MMFM-Challenge
View on GitHub
Official repository for the MMFM challenge
☆26Jun 18, 2024Updated 2 years ago
sterzhang / image-textualization
View on GitHub
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆172Jul 30, 2024Updated last year
CrossmodalGroup / LAPS
View on GitHub
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment, CVPR, 2024
☆110Jun 26, 2025Updated last year
jacobswan1 / ViTCAP
View on GitHub
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43May 28, 2022Updated 4 years ago
mlvlab / RALF
View on GitHub
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆47Sep 12, 2024Updated last year
daqingliu / coco-caption
View on GitHub
A python3 version of coco-caption with spice.
☆20Dec 28, 2019Updated 6 years ago
leolee99 / PAU
View on GitHub
[NeurIPS 2023] The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" acce…
☆28May 14, 2024Updated 2 years ago
aimagelab / ReflectiVA
View on GitHub
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
☆56Jul 14, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
AdaptVision / AdaptVision
View on GitHub
[CVPR 2026] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
☆40Apr 27, 2026Updated 3 months ago
ExplainableML / finer
View on GitHub
[CVPR 2026 Oral] FINER: MLLMs Hallucinate under Fine-grained Negative Queries
☆17Jul 6, 2026Updated 3 weeks ago
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago
yuhangzang / UPT
View on GitHub
☆61May 2, 2025Updated last year
tsunghan-wu / reverse_vlm
View on GitHub
🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…
☆58Jan 22, 2026Updated 6 months ago
youngkyunJang / VDG
View on GitHub
Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024
☆21May 30, 2024Updated 2 years ago
wjpoom / SPEC
View on GitHub
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆52Jun 16, 2025Updated last year