InternLM / ARM-ThinkerLinks
Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
☆79Updated 2 months ago
Alternatives and similar repositories for ARM-Thinker
Users that are interested in ARM-Thinker are comparing it to the libraries listed below
Sorting:
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated last week
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆237Updated this week
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆207Updated last week
- ☆141Updated 3 months ago
- ☆81Updated 7 months ago
- ☆59Updated 5 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆86Updated last week
- The code repository of UniRL☆51Updated 8 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆138Updated 8 months ago
- ☆35Updated 2 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆74Updated 4 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆142Updated last week
- ☆169Updated 2 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆120Updated 8 months ago
- A collection of awesome think with videos papers.☆86Updated 2 months ago
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆186Updated last week
- ICML2025☆63Updated 5 months ago
- Official implement of MIA-DPO☆70Updated last year
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆214Updated this week
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆32Updated 3 weeks ago
- Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.☆52Updated 3 weeks ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 6 months ago
- TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models☆64Updated 2 months ago
- ☆97Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago