NVlabs / QeRLLinks
QeRL enables RL for 32B LLMs on a single H100 GPU.
☆287Updated this week
Alternatives and similar repositories for QeRL
Users that are interested in QeRL are comparing it to the libraries listed below
Sorting:
- ☆27Updated 4 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆192Updated 2 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆51Updated last week
- Geometric-Mean Policy Optimization☆84Updated last week
- Landing repository for the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"☆36Updated last month
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆60Updated 4 months ago
- ☆19Updated 7 months ago
- ☆31Updated 3 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆43Updated last month
- The official github repo for "Diffusion Language Models are Super Data Learners".☆134Updated 2 weeks ago
- Esoteric Language Models☆101Updated last week
- Resa: Transparent Reasoning Models via SAEs☆43Updated 3 weeks ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆32Updated last month
- The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"☆220Updated this week
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆260Updated 3 weeks ago
- ☆105Updated 3 weeks ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆52Updated this week
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆53Updated 7 months ago
- ☆124Updated last week
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆125Updated 2 months ago
- Official PyTorch implementation of TokenSet.☆125Updated 6 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆345Updated 3 months ago
- Official repo of paper LM2☆46Updated 8 months ago
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆23Updated 2 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆220Updated last month
- ☆96Updated last month
- ☆44Updated last month
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 7 months ago
- ☆84Updated 6 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆135Updated this week