mat-agent / MAT-AgentLinks
☆50Updated 3 weeks ago
Alternatives and similar repositories for MAT-Agent
Users that are interested in MAT-Agent are comparing it to the libraries listed below
Sorting:
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆145Updated 2 months ago
- A RLHF Infrastructure for Vision-Language Models☆179Updated 8 months ago
- ☆76Updated last year
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- A comprehensive collection of process reward models.☆95Updated 3 weeks ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆66Updated 4 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆72Updated last month
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆84Updated last week
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆148Updated last month
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆78Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 10 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆117Updated 3 weeks ago
- ☆100Updated last year
- Paper collections of multi-modal LLM for Math/STEM/Code.☆114Updated last week
- ☆152Updated 8 months ago
- Paper List of Inference/Test Time Scaling/Computing☆280Updated 2 weeks ago
- ☆102Updated last week
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆124Updated last month
- ☆25Updated 5 months ago
- ☆57Updated last month
- ☆46Updated 3 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆89Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆32Updated last week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆58Updated last week
- ☆83Updated 6 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆103Updated 11 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆47Updated 3 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆698Updated last week
- Official github repo of G-LLaVA☆146Updated 4 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆342Updated 6 months ago