OpenIXCLab / CODALinks
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
☆33Updated 5 months ago
Alternatives and similar repositories for CODA
Users that are interested in CODA are comparing it to the libraries listed below
Sorting:
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- ☆35Updated 2 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆79Updated last month
- ☆63Updated 6 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 6 months ago
- Quick Long Video Understanding [TMLR2025]☆74Updated 3 months ago
- ☆80Updated 7 months ago
- More reliable Video Understanding Evaluation☆13Updated 4 months ago
- ☆39Updated 8 months ago
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆53Updated 2 months ago
- ☆39Updated last month
- GenExam: A Multidisciplinary Text-to-Image Exam☆55Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 9 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆47Updated 6 months ago
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆31Updated 3 weeks ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 11 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Updated last year
- VCode: SVG as Symbolic Visual Representation☆120Updated last month
- Multimodal RewardBench☆60Updated 11 months ago
- Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"☆27Updated 6 months ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆205Updated this week
- ☆15Updated 8 months ago
- ☆13Updated last year
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Updated 7 months ago
- Test-time Scaling for VAR models☆30Updated 4 months ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆28Updated last month
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated 6 months ago