Infini-AI-Lab / TriForceLinks
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
β276Updated last year
Alternatives and similar repositories for TriForce
Users that are interested in TriForce are comparing it to the libraries listed below
Sorting:
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inferenceβ280Updated 8 months ago
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β275Updated 5 months ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Predictionβ94Updated last year
- β300Updated 6 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]β125Updated 2 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ161Updated 3 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ176Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ368Updated 6 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)β113Updated 10 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ240Updated last year
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ153Updated 11 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ137Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitinβ¦β65Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β214Updated 11 months ago
- APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mentionβ267Updated 2 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ187Updated 4 months ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Samplingβ49Updated 6 months ago
- 16-fold memory access reduction with nearly no loss