michelecafagna26 / ciderLinks
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)
☆13Updated last year
Alternatives and similar repositories for cider
Users that are interested in cider are comparing it to the libraries listed below
Sorting:
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated 9 months ago
- ☆12Updated last year
- ☆27Updated 8 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆57Updated 9 months ago
- Official Repository of LatentSeek☆54Updated last month
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆15Updated 11 months ago
- ☆76Updated last year
- ☆46Updated 8 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆77Updated 8 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆66Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆86Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆86Updated last year
- ☆16Updated 8 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆124Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆76Updated last year
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated this week
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆54Updated 8 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆54Updated last year
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆46Updated last month
- A hot-pluggable tool for visualizing LLaVA's attention.☆20Updated last year
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 4 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆35Updated last week
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆19Updated 9 months ago
- ☆52Updated last year
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆72Updated last month
- Large Language Models Can Self-Improve in Long-context Reasoning☆71Updated 7 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆78Updated last month
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last month
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆71Updated 2 weeks ago