GAD-cell / vlm-grpoLinks
An implementation of GRPO for Unsloth's VLMs training
☆78Updated 5 months ago
Alternatives and similar repositories for vlm-grpo
Users that are interested in vlm-grpo are comparing it to the libraries listed below
Sorting:
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆147Updated 9 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆143Updated 8 months ago
- Visual Planning: Let's Think Only with Images☆294Updated 8 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Updated 6 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆282Updated 11 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated 2 months ago
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.☆248Updated 9 months ago
- [ICLR 2026] Geometric-Mean Policy Optimization☆98Updated last week
- ☆56Updated last year
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆277Updated last year
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆236Updated last month
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆330Updated 8 months ago
- Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization☆349Updated 3 weeks ago
- ☆92Updated 8 months ago
- Tina: Tiny Reasoning Models via LoRA☆316Updated 4 months ago
- MatFormer repo☆70Updated last year
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.☆174Updated 3 months ago
- Reproduction of DeepSeek-R1☆242Updated 9 months ago
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆278Updated 3 weeks ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training☆315Updated 9 months ago
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆147Updated last year
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆62Updated 9 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆358Updated 7 months ago
- ☆206Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310Updated 8 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 8 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆378Updated 7 months ago
- ☆169Updated 4 months ago