☆60Mar 8, 2026Updated last week
Alternatives and similar repositories for critique-GRPO
Users that are interested in critique-GRPO are comparing it to the libraries listed below
Sorting:
- Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning☆42Nov 11, 2025Updated 4 months ago
- Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition☆31May 14, 2025Updated 10 months ago
- ☆18Nov 20, 2024Updated last year
- ☆13Nov 11, 2022Updated 3 years ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆22Feb 23, 2025Updated last year
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆27Aug 9, 2025Updated 7 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆22Nov 9, 2025Updated 4 months ago
- ☆22Nov 11, 2024Updated last year
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆420Oct 4, 2025Updated 5 months ago
- ☆60Feb 27, 2026Updated 2 weeks ago
- ☆48Oct 2, 2025Updated 5 months ago
- ☆111Dec 10, 2025Updated 3 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆421Jul 11, 2025Updated 8 months ago
- ☆43Aug 15, 2025Updated 7 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆71Apr 2, 2025Updated 11 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆71Jul 13, 2025Updated 8 months ago
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs☆45Jun 17, 2025Updated 9 months ago
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆216Sep 26, 2025Updated 5 months ago
- Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling☆29Jan 24, 2026Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆53Jul 23, 2025Updated 7 months ago
- ☆33Oct 31, 2024Updated last year
- [ICLR 2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆52Feb 4, 2026Updated last month
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆161Nov 2, 2024Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Feb 5, 2024Updated 2 years ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- image retrieval using metric learning☆10Nov 22, 2022Updated 3 years ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆93Dec 3, 2024Updated last year
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆44Jul 8, 2025Updated 8 months ago
- A Self-Training Framework for Vision-Language Reasoning☆88Jan 23, 2025Updated last year
- Diffusion for EEG☆11Jan 2, 2023Updated 3 years ago
- grpo to train long form QA and instructions with long-form reward model☆17Jul 17, 2025Updated 8 months ago
- Official Implementation of "Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts" at EMNLP 202…☆13Oct 27, 2024Updated last year
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆34Nov 11, 2025Updated 4 months ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- Python SDK for dataset generation on LightningRod platform ⚡☆26Updated this week
- MIP21 example☆15Jun 20, 2022Updated 3 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- ☆25Aug 19, 2025Updated 6 months ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆24Mar 6, 2026Updated last week