The Next Step Forward in Multimodal LLM Alignment
☆199May 1, 2025Updated 10 months ago
Alternatives and similar repositories for MM-RLHF
Users that are interested in MM-RLHF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 11 months ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆306May 14, 2025Updated 10 months ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 11 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆155Oct 21, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆283May 9, 2025Updated 10 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆406Jan 14, 2026Updated 2 months ago
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆739Dec 8, 2025Updated 3 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 8 months ago
- ☆38May 28, 2025Updated 10 months ago
- ☆38Jul 9, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,500Mar 28, 2025Updated last year
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆137Aug 5, 2025Updated 7 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 6 months ago
- [ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning☆50Feb 16, 2026Updated last month
- A fork to add multimodal model training to open-r1☆1,514Feb 8, 2025Updated last year
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆1,048Mar 15, 2026Updated 2 weeks ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex☆748Mar 19, 2026Updated last week
- R1-onevision, a visual language model capable of deep CoT reasoning.☆578Apr 13, 2025Updated 11 months ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆608May 8, 2024Updated last year
- ✨✨ [ICLR 2026] Think Beyond Images☆590Sep 23, 2025Updated 6 months ago
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆846May 14, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,321Oct 29, 2025Updated 5 months ago
- Aligning LMMs with Factually Augmented RLHF☆393Nov 1, 2023Updated 2 years ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆449May 14, 2025Updated 10 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆374May 27, 2025Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20May 27, 2025Updated 10 months ago
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆218Sep 26, 2025Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 10 months ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆772Sep 7, 2025Updated 6 months ago
- ☆31Jul 21, 2025Updated 8 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…☆1,094Mar 20, 2026Updated last week
- Official Repository of ACL 2025 paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference☆144Mar 2, 2025Updated last year
- A RLHF Infrastructure for Vision-Language Models☆197Nov 15, 2024Updated last year
- [COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆309Aug 25, 2025Updated 7 months ago
- [ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensi…☆22Jun 12, 2025Updated 9 months ago
- ☆23Feb 3, 2026Updated last month
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310May 21, 2025Updated 10 months ago