UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆220Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Efficient_Foundation_Model_Survey
- ☆95Updated 10 months ago
- Awesome list for LLM pruning.☆167Updated this week
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆79Updated last week
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆136Updated this week
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆471Updated last week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆188Updated 3 weeks ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆391Updated 3 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆175Updated 2 weeks ago
- ☆146Updated last month
- Awesome list for LLM quantization☆127Updated this week
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆138Updated 5 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆106Updated last week
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆74Updated last year
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆38Updated last week
- ☆289Updated 7 months ago
- [TMLR 2024] Efficient Large Language Models: A Survey☆1,025Updated last week
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆110Updated last month
- ☆70Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆311Updated 2 months ago
- paper and its code for AI System☆215Updated 2 months ago
- ☆58Updated 3 months ago
- Fast inference from large lauguage models via speculative decoding☆569Updated 2 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆153Updated 2 weeks ago
- A large-scale simulation framework for LLM inference☆277Updated last month
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆241Updated last month
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆202Updated 2 weeks ago
- ☆188Updated 6 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆134Updated 5 months ago
- A low-latency & high-throughput serving engine for LLMs☆245Updated 2 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆147Updated 4 months ago