HJYao00 / MulberryLinks
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
β1,223Updated last month
Alternatives and similar repositories for Mulberry
Users that are interested in Mulberry are comparing it to the libraries listed below
Sorting:
- β125Updated last month
- [NeurIPS 2025π₯]Main source code of SRPO framework.β176Updated last month
- ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).β709Updated 3 weeks ago
- Explain Before You Answer: A Survey on Compositional Visual Reasoningβ289Updated 2 weeks ago
- β¨β¨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learningβ265Updated 5 months ago
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Modelsβ1,138Updated 2 weeks ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Modelsβ118Updated last year
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Mergingβ138Updated 7 months ago
- **Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.β291Updated 2 weeks ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ177Updated 4 months ago
- A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.β196Updated this week
- R1-like Computer-use Agentβ86Updated 7 months ago
- (ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Expertsβ85Updated this week
- Tree Search for LLM Agent Reinforcement Learningβ229Updated last month
- Awesome-Efficient-Inference-for-LRMs is a collection of state-of-the-art, novel, exciting, token-efficient methods for Large Reasoning Moβ¦β230Updated 4 months ago
- β320Updated 2 months ago
- [ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator"β552Updated 3 months ago
- Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecβ¦β111Updated last month
- SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language modelοΌ1.7B, 4B, 8B, 30BοΌβ254Updated last week
- [CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"β427Updated 5 months ago
- [EMNLP'25] s3 - β‘ Efficient & Effective Search Agent Training via RL for RAG (Verifier-Powered RLVR for Search with Minimal Data)β773Updated 2 weeks ago
- When Agent Becomes the Scientist β Building Closed-Loop System from Hypothesis to Verificationβ738Updated last week
- [ICML2025] Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignmentβ133Updated 4 months ago
- Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMsβ161Updated 7 months ago
- Codebase for Iterative DPO Using Rule-based Rewardsβ260Updated 6 months ago
- [COLMβ25] DeepRetrieval β π₯ The First Search Agent Trained by On-Policy Reinforcement Learningβ661Updated 2 weeks ago
- Official Repository for Paper: The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learningβ51Updated 6 months ago
- [ICLR 2025] Vision-Centric Evaluation for Retrieval-Augmented Multimodal Modelsβ61Updated 9 months ago
- An adaptive sampling framework for Reinforce-style LLM post training.β78Updated this week
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Modelsβ145Updated 9 months ago