michelecafagna26 / cider
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)
☆12Updated last year
Alternatives and similar repositories for cider:
Users that are interested in cider are comparing it to the libraries listed below
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆11Updated 4 months ago
- ☆39Updated last year
- ☆138Updated 3 months ago
- ☆19Updated 3 months ago
- ☆66Updated 2 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆59Updated 2 weeks ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆43Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆116Updated 7 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆29Updated 7 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆61Updated 5 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆79Updated 9 months ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆13Updated last year
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆32Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆127Updated 4 months ago
- ☆124Updated 7 months ago
- Repo for paper: https://arxiv.org/abs/2404.06479☆25Updated 4 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆141Updated 9 months ago
- ☆61Updated 8 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆37Updated 4 months ago
- A RLHF Infrastructure for Vision-Language Models☆162Updated 3 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 7 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆71Updated 3 weeks ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆77Updated 7 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 8 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆40Updated 11 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆17Updated 2 weeks ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆42Updated 3 months ago
- ☆59Updated last year