ZhangXJ199/EDGE-GRPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZhangXJ199/EDGE-GRPO)

ZhangXJ199 / EDGE-GRPO

Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

☆22

Alternatives and similar repositories for EDGE-GRPO

Users that are interested in EDGE-GRPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZhangXJ199 / TinyLLaVA-Video-R1
View on GitHub
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆116Dec 24, 2025Updated 7 months ago
martian422 / MaskGRPO
View on GitHub
The official implementation of MaskGRPO: Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models. (ICLR 2026, arxiv…
☆19Jan 27, 2026Updated 6 months ago
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
zwhong714 / PSFT
View on GitHub
[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…
☆39Sep 9, 2025Updated 10 months ago
zqOuO / GWT
View on GitHub
☆13May 4, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Open-Social-World / autolibra
View on GitHub
AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback
☆19Apr 23, 2026Updated 3 months ago
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
hkgc-1 / GHPO
View on GitHub
☆62Jul 21, 2025Updated last year
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
jinhaoduan / SAR
View on GitHub
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆63Sep 4, 2024Updated last year
THU-KEG / LRM-FactEval
View on GitHub
☆17Jun 25, 2025Updated last year
SihengLi99 / RePO
View on GitHub
RePO: Replay-Enhanced Policy Optimization
☆24Jun 12, 2025Updated last year
ZHITENGLI / AdaSVD
View on GitHub
PyTorch code for our paper "AdaSVD: Adaptive Singular Value Decomposition for Large Language Models"
☆15Mar 9, 2025Updated last year
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
LiangruXie / Calibration-Process-in-Black-Box-LLMs
View on GitHub
☆21Nov 26, 2024Updated last year
Jiahao004 / DeepTheorem
View on GitHub
☆27Jun 10, 2025Updated last year
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
xiaohangt / wd1
View on GitHub
Official Implementation of wd1
☆32Sep 25, 2025Updated 10 months ago
liziniu / GEM
View on GitHub
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆58May 12, 2025Updated last year
ludc506 / InternVL-X
View on GitHub
☆16Mar 26, 2025Updated last year
TianjinYellow / SPAM-Optimizer
View on GitHub
☆36Mar 12, 2025Updated last year
ArminAzizi98 / LaMDA
View on GitHub
☆15Nov 7, 2024Updated last year
Shenzhi-Wang / Beyond-the-80-20-Rule-RLVR
View on GitHub
The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…
☆61Jan 5, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 7 months ago
PKU-Baichuan-MLSystemLab / SysBench
View on GitHub
SysBench: Can Large Language Models Follow System Messages?
☆40Sep 4, 2024Updated last year
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 4 months ago
tangzhy / RealCritic
View on GitHub
☆15Jan 27, 2025Updated last year
THUDM / TreeRL
View on GitHub
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆99Jun 16, 2025Updated last year
QingFei1 / R-Search
View on GitHub
[ACL 2026] R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
☆35Jan 4, 2026Updated 6 months ago
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
Ignoramus0817 / SynthQuestions
View on GitHub
☆19Jul 30, 2025Updated 11 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
EvanZhuang / AgenticLU
View on GitHub
Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).
☆13Sep 22, 2025Updated 10 months ago
Optimization-AI / DisCO
View on GitHub
NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆53Mar 14, 2026Updated 4 months ago
Red-Hat-AI-Innovation-Team / SQuat
View on GitHub
☆22Jun 5, 2025Updated last year
songmzhang / DSKDv2
View on GitHub
The official implementation of the paper "A Dual-Space Framework for General Knowledge Distillation of Large Language Models".
☆18Jan 4, 2026Updated 6 months ago
Anonymous1252022 / fp4-all-the-way
View on GitHub
☆51May 20, 2025Updated last year
opendatalab / FakeVLM
View on GitHub
[NeurIPS 2025 🔥] FakeVLM: Advancing Synthetic Image Detection through Explainable Multimodal Models and Fine-Grained Artifact Analysis
☆157Sep 24, 2025Updated 10 months ago
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago