microsoft / MageBenchLinks
Official Repo for MageBench: Bridging Large Multimodal Models to Agents
☆21Updated 5 months ago
Alternatives and similar repositories for MageBench
Users that are interested in MageBench are comparing it to the libraries listed below
Sorting:
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆91Updated last month
- AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆72Updated 2 weeks ago
- ☆80Updated 5 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆46Updated last month
- ☆60Updated 4 months ago
- Pixel-Level Reasoning Model trained with RL☆145Updated this week
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆63Updated this week
- ☆29Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆44Updated 4 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆46Updated last month
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆71Updated 7 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆58Updated this week
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆79Updated last month
- ☆23Updated 3 weeks ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆64Updated 11 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated last week
- ☆43Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆16Updated 7 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆57Updated 8 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆76Updated last week
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆39Updated 2 weeks ago
- ☆35Updated 2 weeks ago
- The implementation of the paper: "Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models"☆33Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆34Updated 11 months ago
- ☆56Updated 2 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆69Updated 3 weeks ago
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆32Updated last year