YanqiDai / MMRoleLinks
(ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
β89Updated 11 months ago
Alternatives and similar repositories for MMRole
Users that are interested in MMRole are comparing it to the libraries listed below
Sorting:
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) πβ35Updated 9 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Modelsβ155Updated 6 months ago
- β59Updated last year
- β177Updated last month
- Test-time preferenece optimization (ICML 2025).β177Updated 8 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.β164Updated 3 months ago
- β90Updated last year
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tooβ¦β382Updated 4 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.β28Updated 11 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training β¦β65Updated 8 months ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agentsβ46Updated 6 months ago
- β133Updated 2 months ago
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ306Updated 2 weeks ago
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"β60Updated 2 months ago
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Geneβ¦β41Updated 8 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-ofβ¦β74Updated 7 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)β172Updated 2 months ago
- The code and data of DPA-RAG, accepted by WWW 2025 main conference.β63Updated 2 months ago
- Scaling Preference Data Curation via Human-AI Synergyβ135Updated 6 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".β94Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoningβ88Updated 11 months ago
- β70Updated 7 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGIβ236Updated 3 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.β134Updated 2 months ago
- β57Updated 6 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agentsβ214Updated 8 months ago
- β111Updated 7 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.β82Updated 2 months ago
- β153Updated 7 months ago
- A Survey of Direct Preference Optimization (DPO)β88Updated 6 months ago