guanyilin428 / Dynamic-Speculative-PlanningLinks
β34Updated 3 months ago
Alternatives and similar repositories for Dynamic-Speculative-Planning
Users that are interested in Dynamic-Speculative-Planning are comparing it to the libraries listed below
Sorting:
- Dynamic Context Selection for Efficient Long-Context LLMsβ50Updated 7 months ago
- π₯ LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilationβ¦β110Updated 2 months ago
- β61Updated 7 months ago
- β72Updated 6 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ136Updated 3 weeks ago
- Kinetics: Rethinking Test-Time Scaling Lawsβ85Updated 6 months ago
- β109Updated 3 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automatonβ39Updated 10 months ago
- The evaluation framework for training-free sparse attention in LLMsβ108Updated 2 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)β63Updated last week
- β126Updated 7 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>β153Updated last month
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"β28Updated 2 months ago
- AI-Driven Research Systems (ADRS)β113Updated 3 weeks ago
- β118Updated last month
- β69Updated last month
- β112Updated last year
- Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"β58Updated 3 weeks ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.β52Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.β216Updated 7 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β281Updated 2 months ago
- β27Updated 9 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn Moreβ34Updated 7 months ago
- [ICLRβ24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β102Updated 6 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ128Updated 6 months ago
- β133Updated 7 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learningβ57Updated 3 weeks ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environmentsβ165Updated last month
- Lottery Ticket Adaptationβ40Updated last year
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"β55Updated last year