lmgame-org / GRLLinks
Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning
☆58Updated last month
Alternatives and similar repositories for GRL
Users that are interested in GRL are comparing it to the libraries listed below
Sorting:
- Defeating the Training-Inference Mismatch via FP16☆180Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆110Updated 3 months ago
- ☆104Updated 11 months ago
- ☆63Updated 7 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆115Updated 2 months ago
- ☆110Updated 4 months ago
- DPO, but faster 🚀☆46Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆128Updated 7 months ago
- Memory optimized Mixture of Experts☆72Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- Kinetics: Rethinking Test-Time Scaling Laws☆85Updated 6 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆229Updated 7 months ago
- Esoteric Language Models☆109Updated 2 months ago
- ☆133Updated 8 months ago
- ☆220Updated 2 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Updated last month
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆47Updated 6 months ago
- ☆269Updated 7 months ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆259Updated this week
- ☆71Updated 2 weeks ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆432Updated this week
- ☆85Updated 2 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- Linear Attention Sequence Parallelism (LASP)☆88Updated last year
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆175Updated 2 weeks ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆477Updated 2 months ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Updated this week
- Spectral Sphere Optimizer☆90Updated 2 weeks ago
- ☆54Updated last year
- ☆91Updated last year