mat-agent / MAT-AgentLinks
☆41Updated this week
Alternatives and similar repositories for MAT-Agent
Users that are interested in MAT-Agent are comparing it to the libraries listed below
Sorting:
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 4 months ago
- ☆74Updated last year
- ☆24Updated 3 months ago
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆63Updated 2 months ago
- ☆101Updated last month
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆50Updated 7 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆126Updated this week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 9 months ago
- ☆45Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆44Updated 2 weeks ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆51Updated last week
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆107Updated last month
- ☆77Updated 4 months ago
- ☆147Updated 7 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆65Updated this week
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 3 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆41Updated 2 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆42Updated 2 weeks ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆29Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆63Updated 10 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆47Updated last year
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape☆15Updated last week
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆103Updated 2 weeks ago
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆51Updated last week
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆88Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆96Updated 10 months ago
- ☆26Updated 6 months ago
- A RLHF Infrastructure for Vision-Language Models☆176Updated 6 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated last month