aiming-lab / MIRALinks
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
☆26Updated 3 months ago
Alternatives and similar repositories for MIRA
Users that are interested in MIRA are comparing it to the libraries listed below
Sorting:
- ☆23Updated last week
- Scaling Agentic Environments Automatically.☆47Updated 2 weeks ago
- More reliable Video Understanding Evaluation☆13Updated 4 months ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆33Updated 5 months ago
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆35Updated 3 weeks ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆44Updated last year
- Official Repository of Native Parallel Reasoner☆100Updated 3 weeks ago
- Ring-V2 is a reasoning MoE LLM provided and open-sourced by InclusionAI.☆90Updated 3 months ago
- MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents☆40Updated this week
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆136Updated 5 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆89Updated 8 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Updated last year
- ☆50Updated 2 months ago
- Stable-DiffCoder is a family of lightweight open-source code DLLMs(diffusion large language models) comprising base and instruct models, …☆65Updated 2 weeks ago
- ☆64Updated this week
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆63Updated last year
- ☆68Updated 4 months ago
- ☆42Updated 6 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Updated 11 months ago
- ☆46Updated 7 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last week
- Official Repo for MageBench: Bridging Large Multimodal Models to Agents☆22Updated last year
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Updated 4 months ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆65Updated 2 weeks ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- A holistic benchmark for LLM abstention☆69Updated 5 months ago
- EAFT(Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting) official repo☆82Updated 3 weeks ago
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆107Updated 6 months ago