Liuziyu77 / Visual-RFTLinks
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆1,907Updated last week
Alternatives and similar repositories for Visual-RFT
Users that are interested in Visual-RFT are comparing it to the libraries listed below
Sorting:
- A fork to add multimodal model training to open-r1☆1,272Updated 3 months ago
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆573Updated 3 weeks ago
- ☆977Updated this week
- An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.☆763Updated this week
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆758Updated 2 weeks ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆785Updated this week
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆591Updated last week
- R1-onevision, a visual language model capable of deep CoT reasoning.☆521Updated last month
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆374Updated last month
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆619Updated last week
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆379Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆2,441Updated last week
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆2,446Updated this week
- Solve Visual Understanding with Reinforced VLMs☆4,990Updated 3 weeks ago
- A Framework of Small-scale Large Multimodal Models☆824Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆587Updated 2 months ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆535Updated 2 weeks ago
- VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)☆470Updated last month
- ☆362Updated 3 months ago
- Align Anything: Training All-modality Model with Feedback☆3,814Updated this week
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆431Updated 2 weeks ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆868Updated last month
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆340Updated 3 months ago
- Next-Token Prediction is All You Need☆2,134Updated 2 months ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆541Updated 7 months ago
- Official Repository of Cooragent☆1,160Updated last week
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆305Updated 2 weeks ago
- ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,307Updated 2 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆489Updated this week
- Awesome RL Reasoning Recipes ("Triple R")☆580Updated this week