xiwenc1 / DRA-GRPOLinks
Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
☆15Updated last week
Alternatives and similar repositories for DRA-GRPO
Users that are interested in DRA-GRPO are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆34Updated 5 months ago
- A Sober Look at Language Model Reasoning☆74Updated last week
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆25Updated last week
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆32Updated 2 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆59Updated 3 months ago
- Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning☆33Updated 7 months ago
- ☆46Updated 2 months ago
- ☆18Updated last month
- Codes for Merging Large Language Models☆32Updated 10 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆59Updated last year
- ☆139Updated last month
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆45Updated 8 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆28Updated 3 weeks ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated last year
- One-shot Entropy Minimization☆149Updated last week
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 4 months ago
- EMPO, A Fully Unsupervised RLVR Method☆40Updated 2 weeks ago
- ☆57Updated 7 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆27Updated 2 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆123Updated 2 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆70Updated 4 months ago
- This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting☆19Updated 10 months ago
- Pytorch implementation of Tree Preference Optimization (TPO) (Accepyed by ICLR'25)☆19Updated 2 months ago
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆18Updated 3 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆60Updated 3 weeks ago
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆46Updated 3 weeks ago