lime-RL/DCPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lime-RL/DCPO)

lime-RL / DCPO

DCPO: Dynamic Adaptive Clipping for RL

☆49

Alternatives and similar repositories for DCPO

Users that are interested in DCPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vivekvar-dl / GSPO-DeepSeek-R1-Distill-Qwen-1.5B
View on GitHub
☆18Mar 15, 2026Updated 4 months ago
juyongjiang / KaSA
View on GitHub
[ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"
☆22Jan 16, 2025Updated last year
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 3 months ago
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
emmyqin / iw_sft
View on GitHub
☆28Jul 18, 2025Updated last year
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated 2 months ago
foreverlasting1202 / QuestA
View on GitHub
☆22Jan 2, 2026Updated 6 months ago
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
Time-Search / TimeSearch-R
View on GitHub
[ICLR 2026] Official code for paper: TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinf…
☆27Jan 29, 2026Updated 5 months ago
HectorHHZ / Sparse_Matrix_Tuning
View on GitHub
Github repo for ICLR-2025 paper, Fine-tuning Large Language Models with Sparse Matrices
☆26Feb 2, 2026Updated 5 months ago
sheriyuo / DART
View on GitHub
Reasoning and Tool-use Compete in Agentic RL: From Quantifying Interference to Disentangled Tuning
☆32May 7, 2026Updated 2 months ago
taoszhang / MMhops-R1
View on GitHub
MMhops-R1: Multimodal Multi-hop Reasoning
☆16Feb 28, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zwhong714 / PSFT
View on GitHub
[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…
☆38Sep 9, 2025Updated 10 months ago
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
UMass-Embodied-AGI / BudgetGuidance
View on GitHub
[ACL'26 Findings] Steering LLM Thinking with Budget Guidance
☆33Feb 19, 2026Updated 5 months ago
Infini-AI-Lab / GRESO
View on GitHub
☆81Jun 8, 2026Updated last month
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
nick7nlp / FastCuRL
View on GitHub
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)
☆61Oct 10, 2025Updated 9 months ago
tangzhy / RealCritic
View on GitHub
☆15Jan 27, 2025Updated last year
FloyedShen / VESPO
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
wutaiqiang / MI
View on GitHub
Official code for paper "Revisiting Model Interpolation for Efficient Reasoning"
☆17Jul 14, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ZJU-REAL / InftyThink-Plus
View on GitHub
[ICML 2026] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
☆34May 25, 2026Updated 2 months ago
kleinercubs / ImgFact
View on GitHub
Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding
☆11May 23, 2024Updated 2 years ago
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆50Sep 23, 2025Updated 10 months ago
mukhal / ThinkPRM
View on GitHub
[TMLR] Process Reward Models That Think
☆89Nov 29, 2025Updated 7 months ago
LaVi-Lab / FTTT
View on GitHub
[ACL 2025] Official code for ''Learning to Reason from Feedback at Test-Time''.
☆13May 16, 2025Updated last year
JIA-Lab-research / Scaf-GRPO
View on GitHub
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
☆22Feb 8, 2026Updated 5 months ago
frt03 / jax_dt
View on GitHub
Minimal Decision Transformer Implementation written in Jax (Flax).
☆18Aug 8, 2022Updated 3 years ago
shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
pzs19 / LEMMA
View on GitHub
☆16Sep 4, 2025Updated 10 months ago
BytedTsinghua-SIA / Enigmata
View on GitHub
Resources for the Enigmata Project.
☆82Aug 13, 2025Updated 11 months ago
597358816 / AEPO
View on GitHub
Arbitrary Entropy Policy Optimization: Entropy Is Controllable in Reinforcement Fine-tuning
☆17Jan 19, 2026Updated 6 months ago
Kwai-Klear / RLEP
View on GitHub
RL with Experience Replay
☆59Jul 27, 2025Updated 11 months ago
Lux0926 / ASPRM
View on GitHub
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
☆10Mar 2, 2025Updated last year
wtybest / EnMMDiT
View on GitHub
[TPAMI 2026] Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
☆15Mar 7, 2026Updated 4 months ago
alibaba-damo-academy / VL-Cogito
View on GitHub
☆24Nov 4, 2025Updated 8 months ago