tiingweii-shii / Awesome-Resource-Efficient-LLM-Papers
a curated list of high-quality papers on resource-efficient LLMs π±
β93Updated 2 weeks ago
Alternatives and similar repositories for Awesome-Resource-Efficient-LLM-Papers:
Users that are interested in Awesome-Resource-Efficient-LLM-Papers are comparing it to the libraries listed below
- Survey Paper List - Efficient LLM and Foundation Modelsβ238Updated 3 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papβ¦β205Updated 3 weeks ago
- Awesome list for LLM pruning.β192Updated last month
- Awesome list for LLM quantizationβ156Updated 3 weeks ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".β81Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β51Updated 3 weeks ago
- β99Updated last year
- The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".β200Updated this week
- π° Must-read papers on KV Cache Compression (constantly updating π€).β256Updated this week
- Awesome-LLM-KV-Cache: A curated list of πAwesome LLM KV Cache Papers with Codes.β186Updated last month
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β214Updated 2 months ago
- β43Updated 2 weeks ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark oβ¦β61Updated 2 weeks ago
- ATC23 AEβ44Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β151Updated 7 months ago
- Implement some method of LLM KV Cache Sparsityβ30Updated 7 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ152Updated 6 months ago
- β72Updated 2 years ago
- β49Updated last year
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Modelsβ42Updated 2 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ195Updated last year
- β40Updated last month
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.β58Updated last month
- β51Updated 9 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inferenceβ54Updated 3 weeks ago
- A MoE impl for PyTorch, [ATC'23] SmartMoEβ61Updated last year
- The official code for paper "parallel speculative decoding with adaptive draft length."β32Updated 4 months ago
- β50Updated 3 months ago
- Multi-Candidate Speculative Decodingβ33Updated 8 months ago
- LLM Serving Performance Evaluation Harnessβ65Updated 4 months ago