xiwenc1 / DRA-GRPOLinks

Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

☆21

Alternatives and similar repositories for DRA-GRPO

Users that are interested in DRA-GRPO are comparing it to the libraries listed below

Sorting:

tmlr-group / NoisyRationales
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆37Updated 3 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆87Updated last month
princeton-pli / what-makes-good-rm
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆39Updated last month
ZhentingWang / DUMP
☆32Updated 6 months ago
Optimization-AI / DisCO
Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆43Updated last week
Dereck0602 / Awesome_Test_Time_LLMs
☆131Updated 8 months ago
Joshua-Ren / Learning_dynamics_LLM
☆181Updated 6 months ago
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆86Updated 8 months ago
keven980716 / weak-to-strong-deception
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆13Updated last year
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆92Updated last year
ChnQ / MI-Peaks
☆55Updated 4 months ago
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆187Updated 5 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆132Updated 7 months ago
rdi-berkeley / awesome-RLVR-boundary
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…
☆76Updated 3 weeks ago
aeroplanepaper / GRPO-LEAD
☆30Updated last month
QingyangZhang / Label-Free-RLVR
☆282Updated 4 months ago
Persdre / NeurIPS-2024-LLM-Papers
Accepted LLM Papers in NeurIPS 2024
☆37Updated last year
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆80Updated 4 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆71Updated 6 months ago
SophieZheng998 / ALI-Agent
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Updated 3 months ago
deeplearning-wisc / haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
☆61Updated 7 months ago
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆95Updated last year
wizard-III / ArcherCodeR
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆43Updated 3 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated last year
tanganke / weight-ensembling_MoE
Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"
☆29Updated last year
eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆29Updated 4 months ago
lichengliu03 / unary-feedback
☆38Updated 3 months ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
alenai97 / PEFT-MLLM
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆23Updated last year