haizhongzheng / LTELinks

☆12

Alternatives and similar repositories for LTE

Users that are interested in LTE are comparing it to the libraries listed below

Sorting:

henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆86Updated 9 months ago
LiuXiaoxuanPKU / OSD
☆61Updated 11 months ago
wangqinsi1 / CoreInfer
This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…
☆17Updated last year
shadowpa0327 / Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆148Updated 9 months ago
luuyin / OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆74Updated 4 months ago
imagination-research / EEP
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆21Updated 2 weeks ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆134Updated 3 months ago
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆41Updated last year
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Updated last year
IST-DASLab / EvoPress
☆37Updated last week
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆94Updated 2 years ago
stephenqz / OATS
Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition
☆15Updated 7 months ago
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated last year
ZO-Bench / ZO-LLM
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆117Updated 4 months ago
FMInference / DejaVu
☆347Updated last year
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆353Updated 4 months ago
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆608Updated 2 months ago
hemingkx / Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆336Updated 7 months ago
yxli2123 / LoSparse
☆62Updated 2 years ago
jy-yuan / KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆335Updated last week
falcon-xu / early-exit-papers
A curated list of early exiting (LLM, CV, NLP, etc)
☆68Updated last year
biomedical-cybernetics / Relative-importance-and-activation-pruning
☆52Updated last year
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆121Updated 4 months ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆114Updated 2 months ago
hdong920 / LESS
☆53Updated last year
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Updated 8 months ago
thu-nics / qllm-eval
Code Repository of Evaluating Quantized Large Language Models
☆137Updated last year
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆108Updated 8 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆170Updated last year
wangqinsi1 / Dobi-SVD
Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
☆49Updated last month