FusionBrainLab / Vision_GRPOLinks
☆86Updated 11 months ago
Alternatives and similar repositories for Vision_GRPO
Users that are interested in Vision_GRPO are comparing it to the libraries listed below
Sorting:
- ☆54Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Updated 3 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆68Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆171Updated last year
- ☆110Updated last year
- ☆74Updated 8 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Updated last year
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆135Updated last year
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Updated last year
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆44Updated 9 months ago
- ☆124Updated last year
- Visual self-questioning for large vision-language assistant.☆45Updated 6 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109Updated 8 months ago
- [NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO☆78Updated 3 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆89Updated 11 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆152Updated 3 months ago
- Collect the awesome works evolved around reasoning models like O1/R1 in visual domain☆53Updated 6 months ago
- ☆16Updated 10 months ago
- GRPO Algorithm for Llava Architecture (Based on Verl)☆47Updated 8 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Updated 2 years ago
- Official Codebase for "Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers"☆24Updated 8 months ago
- A RLHF Infrastructure for Vision-Language Models☆195Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆147Updated last year
- ☆107Updated 7 months ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆277Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Updated 4 months ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Updated 11 months ago
- PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning☆232Updated last year