alphadl / R1Links
πenhanced GRPO with more verifiable rewards and real-time evaluators
β37Updated 6 months ago
Alternatives and similar repositories for R1
Users that are interested in R1 are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ196Updated last week
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"β138Updated last month
- Model merging is a highly efficient approach for long-to-short reasoning.β92Updated 2 months ago
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learningβ168Updated last year
- [2025-TMLR] A Survey on the Honesty of Large Language Modelsβ63Updated last year
- my commonly-used toolsβ63Updated 11 months ago
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Modelsβ58Updated last year
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.β177Updated 5 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".β138Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ184Updated 5 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"β52Updated last year
- Extrapolating RLVR to General Domains without Verifiersβ184Updated 4 months ago
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shapingβ61Updated 6 months ago
- [ACL 2025] A Neural-Symbolic Self-Training Frameworkβ117Updated 6 months ago
- β18Updated last year
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-viβ¦β117Updated 6 months ago
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuningβ47Updated last year
- β292Updated 5 months ago
- [ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)β181Updated 10 months ago
- A method of ensemble learning for heterogeneous large language models.β64Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?β85Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β134Updated 9 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.β131Updated last month
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".β83Updated 11 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.β85Updated 10 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward modelβ¦β60Updated 6 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigatingβ98Updated last year
- π LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Trainingβ89Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ69Updated 5 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMsβ135Updated 7 months ago