Infini-AI-Lab / MagicPIG
MagicPIG: LSH Sampling for Efficient LLM Generation
β59Updated 3 weeks ago
Related projects β
Alternatives and complementary repositories for MagicPIG
- π° Must-read papers on KV Cache Compression (constantly updating π€).β143Updated this week
- Awesome-LLM-KV-Cache: A curated list of πAwesome LLM KV Cache Papers with Codes.β114Updated 2 weeks ago
- 16-fold memory access reduction with nearly no lossβ58Updated last week
- Code for Palu: Compressing KV-Cache with Low-Rank Projectionβ57Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ203Updated 3 weeks ago
- β42Updated 7 months ago
- β36Updated 3 months ago
- β38Updated last month
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β140Updated 5 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ149Updated 4 months ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".β74Updated last year
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of β¦β106Updated 4 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inferenceβ39Updated this week
- β189Updated 6 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.β39Updated last month
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark oβ¦β50Updated last month
- Multi-Candidate Speculative Decodingβ28Updated 7 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ181Updated last year
- PyTorch library for cost-effective, fast and easy serving of MoE models.β103Updated 3 months ago
- β51Updated last month
- QAQ: Quality Adaptive Quantization for LLM KV Cacheβ42Updated 7 months ago
- β31Updated this week
- β56Updated last week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β190Updated 3 weeks ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rankβ14Updated 2 weeks ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining itsβ¦β19Updated 2 months ago
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Modelsβ13Updated last month
- TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attentionβ21Updated last month
- A sparse attention kernel supporting mix sparse patternsβ58Updated last month
- β46Updated 5 months ago