njucckevin / CapArenaLinks
An Arena-style Automated Evaluation Benchmark for Detailed Captioning
☆34Updated this week
Alternatives and similar repositories for CapArena
Users that are interested in CapArena are comparing it to the libraries listed below
Sorting:
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 4 months ago
- ☆74Updated last year
- ☆19Updated last month
- A Survey on the Honesty of Large Language Models☆57Updated 5 months ago
- [ACL 2025] A Neural-Symbolic Self-Training Framework☆109Updated this week
- ☆45Updated last month
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆107Updated 3 weeks ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Updated last year
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆107Updated 7 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆50Updated 7 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆41Updated 2 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆60Updated 5 months ago
- ☆77Updated 4 months ago
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago
- ☆60Updated 2 weeks ago
- The official code repository for PRMBench.☆73Updated 3 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆32Updated 10 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆31Updated last month
- A RLHF Infrastructure for Vision-Language Models☆176Updated 6 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆50Updated 7 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆74Updated 6 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆95Updated last year
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆26Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆63Updated 10 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆97Updated 6 months ago
- ☆169Updated this week
- ☆54Updated 2 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- ☆46Updated 7 months ago