[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆145Feb 23, 2026Updated last week
Alternatives and similar repositories for OpenMMReasoner
Users that are interested in OpenMMReasoner are comparing it to the libraries listed below
Sorting:
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆117Feb 4, 2026Updated 3 weeks ago
- Quick Long Video Understanding [TMLR2025]☆76Oct 27, 2025Updated 4 months ago
- Agent-OM: Leveraging LLM Agents for Ontology Matching☆18Jan 24, 2026Updated last month
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 4 months ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- Autonomous AI backend for deep research AI applications.☆25Feb 17, 2026Updated last week
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation.☆127Feb 10, 2026Updated 2 weeks ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 8 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated 2 months ago
- ☆68Nov 5, 2025Updated 3 months ago
- the open-source code of QAgent☆53Oct 14, 2025Updated 4 months ago
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Sh…☆19Dec 30, 2025Updated 2 months ago
- The code for the paper *The Sensitivity of Counterfactual Fairness to Unmeasured Confounding* @ UAI 2019☆14Apr 4, 2020Updated 5 years ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- Lightweight Transformer for Multi-modal Tasks☆16Dec 9, 2022Updated 3 years ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆128Jul 24, 2025Updated 7 months ago
- Code for Label Propagation for Zero-shot Classification with Vision-Language Models (CVPR2024)☆45Jul 23, 2024Updated last year
- ☆21Aug 8, 2024Updated last year
- RLLaVA is a user-friendly framework for multi-modal RL research and optimized for resource-constrained teams.☆56Feb 22, 2026Updated last week
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆413May 5, 2025Updated 9 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆79Dec 14, 2025Updated 2 months ago
- Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation☆45Feb 27, 2023Updated 3 years ago
- MiMo-VL☆628Aug 21, 2025Updated 6 months ago
- ☆61Dec 5, 2025Updated 2 months ago
- ☆50Dec 11, 2025Updated 2 months ago
- Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…☆91Jan 29, 2026Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆141Aug 21, 2025Updated 6 months ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆50Aug 23, 2024Updated last year
- Process Orchestration Framework: A camunda 7 fork☆21Updated this week
- Fully Open Framework for Democratized Multimodal Training☆754Dec 27, 2025Updated 2 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆182Jun 5, 2025Updated 8 months ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆259Feb 13, 2026Updated 2 weeks ago
- Official implementation for paper "How Far Are We from Genuinely Useful Deep Research Agents?"☆64Dec 10, 2025Updated 2 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆24Nov 25, 2024Updated last year
- Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs (ECCV 2024)☆19Jul 15, 2024Updated last year
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆106Jun 29, 2025Updated 8 months ago
- Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"☆66Jan 13, 2026Updated last month