sail-sg / Stable-RLLinks
Rethinking the Trust Region in LLM Reinforcement Learning
☆34Updated this week
Alternatives and similar repositories for Stable-RL
Users that are interested in Stable-RL are comparing it to the libraries listed below
Sorting:
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Updated last month
- ☆111Updated 4 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 11 months ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Updated 6 months ago
- ☆19Updated last year
- ☆66Updated 7 months ago
- ☆63Updated 7 months ago
- Research work aimed at addressing the problem of modeling infinite-length context☆46Updated last month
- ☆15Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆128Updated 7 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Updated 6 months ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated this week
- ☆64Updated this week
- PeRL: Parameter-Efficient Reinforcement Learning☆68Updated 3 weeks ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆104Updated last week
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- ☆84Updated 3 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆177Updated 3 weeks ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Updated 6 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆60Updated 3 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆120Updated last month
- Official Repository of Native Parallel Reasoner☆100Updated this week
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Updated last month
- [ICLR 2026] Geometric-Mean Policy Optimization☆99Updated 2 weeks ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆221Updated 3 months ago
- Kinetics: Rethinking Test-Time Scaling Laws☆86Updated 6 months ago
- Esoteric Language Models☆111Updated this week
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆60Updated 2 weeks ago
- ☆75Updated 7 months ago