OpenIXCLab / CODALinks
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
☆25Updated this week
Alternatives and similar repositories for CODA
Users that are interested in CODA are comparing it to the libraries listed below
Sorting:
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 7 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆68Updated last month
- ☆70Updated 2 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆48Updated last month
- Quick Long Video Understanding☆62Updated 2 months ago
- ☆37Updated 3 months ago
- ☆53Updated 3 weeks ago
- ☆53Updated last month
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated last month
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 2 weeks ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆96Updated 3 weeks ago
- Test-time Scaling for VAR models☆21Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆47Updated last month
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆79Updated last month
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆36Updated 2 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆17Updated 4 months ago
- [Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…☆29Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆72Updated 3 weeks ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆53Updated 2 months ago
- VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆52Updated last month
- Text-Only Data Synthesis for Vision Language Model Training☆21Updated 2 months ago
- ☆87Updated 2 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆32Updated 2 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆145Updated 3 weeks ago
- Official implement of MIA-DPO☆64Updated 7 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆74Updated 3 weeks ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆45Updated last month
- SFT+RL boosts multimodal reasoning☆27Updated 2 months ago
- ☆30Updated 8 months ago
- ☆122Updated 2 months ago