michelecafagna26 / ciderLinks
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)
☆13Updated last month
Alternatives and similar repositories for cider
Users that are interested in cider are comparing it to the libraries listed below
Sorting:
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated last year
- ☆11Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆177Updated last year
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Updated last year
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆90Updated last year
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Updated 7 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Updated last year
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.☆139Updated 2 years ago
- Official Code of IdealGPT☆35Updated 2 years ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆159Updated 4 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 3 months ago
- Preference Learning for LLaVA☆58Updated last year
- ☆102Updated 2 years ago
- ☆50Updated 2 years ago
- ☆17Updated last year
- Official github repo of G-LLaVA☆148Updated 11 months ago
- ☆12Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆63Updated last year
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆28Updated last year
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆134Updated 2 years ago
- ☆14Updated last year
- ☆155Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆70Updated last year
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆113Updated 11 months ago
- ☆88Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year