OpenDFM / MULTI-Benchmark
MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images
☆34Updated last month
Alternatives and similar repositories for MULTI-Benchmark:
Users that are interested in MULTI-Benchmark are comparing it to the libraries listed below
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆45Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆37Updated 4 months ago
- A Self-Training Framework for Vision-Language Reasoning☆71Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated 3 weeks ago
- ☆68Updated 2 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆45Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆55Updated last month
- ☆22Updated this week
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆56Updated 3 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆54Updated 5 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆35Updated 2 weeks ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆46Updated last month
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆43Updated 4 months ago
- ☆71Updated this week
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆50Updated 4 months ago
- A Survey on the Honesty of Large Language Models☆56Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 8 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 7 months ago
- [AAAI 2025]Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning☆30Updated 5 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆78Updated last month
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning☆29Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆96Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆130Updated last month
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆31Updated 3 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆98Updated last week
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆73Updated last month
- Paper List of Inference/Test Time Scaling/Computing☆127Updated last week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆49Updated last week
- Official github repo of G-LLaVA☆131Updated last month