aimagelab / awesome-captioning-evaluationLinks

[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

☆22

Alternatives and similar repositories for awesome-captioning-evaluation

Users that are interested in awesome-captioning-evaluation are comparing it to the libraries listed below

Sorting:

RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆88Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 5 months ago
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 4 months ago
keio-smilab24 / Polos
[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
☆32Updated 6 months ago
mertyg / vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆286Updated 2 years ago
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆66Updated last year
amazon-science / QA-ViT
☆69Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆93Updated last year
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
jmiemirza / MMFM-Challenge
Official repository for the MMFM challenge
☆25Updated last year
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated 2 years ago
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆66Updated 2 months ago
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆31Updated last week
McGill-NLP / imagecode
Code and data for ImageCoDe, a contextual vison-and-language benchmark
☆41Updated last year
chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Updated last year
google / haloquest
☆23Updated last year
hendryx-scale / mhal-detect
M-HalDetect Dataset Release
☆26Updated 2 years ago
IntelLabs / lvlm-interpret
☆108Updated 8 months ago
nickjiang2378 / vlm-hallucinations
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆92Updated this week
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆154Updated last year
navervision / lincir
Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)
☆137Updated last year
changdaeoh / multimodal-mixup
Official implementation for NeurIPS'23 paper "Geodesic Multi-Modal Mixup for Robust Fine-Tuning"
☆35Updated last year
ExplainableML / Vision_by_Language
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
☆81Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
mrwu-mac / R-Bench
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆22Updated 11 months ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆202Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆89Updated last year
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 9 months ago
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
gyxxyg / VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆115Updated 11 months ago