[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆138Dec 5, 2025Updated 2 months ago
Alternatives and similar repositories for fastrl
Users that are interested in fastrl are comparing it to the libraries listed below
Sorting:
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)☆37Feb 10, 2026Updated 3 weeks ago
- ☆226Nov 19, 2025Updated 3 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆93Dec 2, 2025Updated 3 months ago
- ☆38Aug 7, 2025Updated 6 months ago
- ☆149Feb 25, 2026Updated last week
- ☆20Dec 24, 2024Updated last year
- APEX+ is an LLM Serving Simulator☆42Jun 16, 2025Updated 8 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆144Dec 4, 2024Updated last year
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆91Feb 23, 2026Updated last week
- Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"☆66Jan 13, 2026Updated last month
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆87Feb 7, 2026Updated 3 weeks ago
- ☆11Oct 11, 2023Updated 2 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Code for Research Project TLDR☆25Jul 28, 2025Updated 7 months ago
- Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"☆17Nov 22, 2021Updated 4 years ago
- ☆11Apr 5, 2021Updated 4 years ago
- Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”☆130Updated this week
- ☆12Oct 16, 2022Updated 3 years ago
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- [ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentation☆14Jan 10, 2025Updated last year
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆57Jul 23, 2024Updated last year
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆74Jul 14, 2025Updated 7 months ago
- Repository for GeoUni, A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions.☆19Jun 12, 2025Updated 8 months ago
- Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/25…☆23Dec 10, 2025Updated 2 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆42Feb 13, 2025Updated last year
- ☆87Updated this week
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- DFlash: Block Diffusion for Flash Speculative Decoding☆593Feb 18, 2026Updated 2 weeks ago
- A challenging aggregation benchmark for long-context models☆37Feb 22, 2026Updated last week
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Sh…☆19Dec 30, 2025Updated 2 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆358Jan 12, 2026Updated last month
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆71Jul 5, 2025Updated 7 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆127Nov 10, 2025Updated 3 months ago
- ☆37Sep 13, 2025Updated 5 months ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆51Jul 15, 2025Updated 7 months ago