Kwai-YuanQi / MM-RLHFView external linksLinks
The Next Step Forward in Multimodal LLM Alignment
☆197May 1, 2025Updated 9 months ago
Alternatives and similar repositories for MM-RLHF
Users that are interested in MM-RLHF are comparing it to the libraries listed below
Sorting:
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated 10 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated 9 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆151Oct 21, 2025Updated 3 months ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 9 months ago
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆281May 9, 2025Updated 9 months ago
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆730Dec 8, 2025Updated 2 months ago
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 5 months ago
- ☆37Jul 9, 2024Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆840May 14, 2025Updated 9 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex☆706Feb 10, 2026Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 8 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆575Apr 13, 2025Updated 10 months ago
- A fork to add multimodal model training to open-r1☆1,474Feb 8, 2025Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆33Jul 16, 2025Updated 7 months ago
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,487Mar 28, 2025Updated 10 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆136Aug 5, 2025Updated 6 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 7 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20May 27, 2025Updated 8 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,319Oct 29, 2025Updated 3 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆215Sep 26, 2025Updated 4 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆444May 14, 2025Updated 9 months ago
- Multimodal RewardBench☆61Feb 21, 2025Updated 11 months ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆768Sep 7, 2025Updated 5 months ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated 2 weeks ago
- Official Repository of ACL 2025 paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference☆145Mar 2, 2025Updated 11 months ago
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆397Jan 14, 2026Updated last month
- ☆37May 28, 2025Updated 8 months ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Jan 23, 2026Updated 3 weeks ago
- ☆31Jul 21, 2025Updated 6 months ago
- [ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…☆765Jan 26, 2026Updated 3 weeks ago
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆941Updated this week
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310May 21, 2025Updated 8 months ago
- [ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning☆47Updated this week
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆59Dec 13, 2024Updated last year
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆75Sep 19, 2025Updated 4 months ago
- A RLHF Infrastructure for Vision-Language Models☆196Nov 15, 2024Updated last year