mat-agent / MAT-Agent
☆32Updated last month
Alternatives and similar repositories for MAT-Agent
Users that are interested in MAT-Agent are comparing it to the libraries listed below
Sorting:
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- ☆73Updated 11 months ago
- A comprehensive collection of process reward models.☆76Updated last week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- ☆43Updated last month
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆60Updated 2 months ago
- ☆24Updated 3 months ago
- ☆99Updated last year
- A RLHF Infrastructure for Vision-Language Models☆175Updated 6 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆74Updated 6 months ago
- ☆95Updated last month
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆38Updated last month
- [Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search☆16Updated 3 weeks ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆73Updated 11 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆76Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- ☆75Updated 4 months ago
- ☆146Updated 6 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆107Updated 3 weeks ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago
- ☆35Updated 10 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆104Updated 2 weeks ago
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆16Updated 2 weeks ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆67Updated 2 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆35Updated 3 months ago
- VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues☆40Updated 2 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆34Updated last week
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated 7 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆58Updated 4 months ago