Leosang-lx / FlowSpecLinks
Continuous Pipelined Speculative Decoding
☆16Updated last month
Alternatives and similar repositories for FlowSpec
Users that are interested in FlowSpec are comparing it to the libraries listed below
Sorting:
- ☆28Updated 8 months ago
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆49Updated 6 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆88Updated 2 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Updated 6 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Updated last year
- Code repo for efficient quantized MoE inference with mixture of low-rank compensators☆31Updated 9 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Updated last year
- ☆32Updated 3 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Updated last year
- ☆15Updated last year
- ☆85Updated 9 months ago
- This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE"☆36Updated 4 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Updated last month
- ☆53Updated 8 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆52Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆126Updated 2 months ago
- ☆36Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆40Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆69Updated last year
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆131Updated 2 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Updated last year
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆48Updated 3 months ago
- Vocabulary Parallelism☆25Updated 11 months ago
- Advancing the frontier of efficient AI☆53Updated last week
- ☆22Updated 11 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆56Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Updated last month
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆160Updated 4 months ago
- Preview Code for Continuum Paper☆35Updated 2 weeks ago