hao-ai-lab / DynasorLinks
Simple extension on vLLM to help you speed up reasoning model without training.
☆152Updated this week
Alternatives and similar repositories for Dynasor
Users that are interested in Dynasor are comparing it to the libraries listed below
Sorting:
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆161Updated 11 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆106Updated 2 months ago
- ☆79Updated 4 months ago
- ☆248Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆107Updated 2 weeks ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆70Updated 2 months ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆214Updated 5 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆37Updated 2 weeks ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆463Updated 3 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆31Updated last month
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 11 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆84Updated 11 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆129Updated last week
- EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).☆63Updated 11 months ago
- ☆50Updated 6 months ago
- KV cache compression for high-throughput LLM inference☆129Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆201Updated 3 weeks ago
- ☆129Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆291Updated 6 months ago
- Async pipelined version of Verl☆91Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆186Updated 2 months ago
- ☆45Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆57Updated 11 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 3 weeks ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆74Updated 5 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆166Updated last week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆102Updated 4 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 8 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆202Updated 6 months ago