haizhongzheng / LTELinks
☆10Updated 7 months ago
Alternatives and similar repositories for LTE
Users that are interested in LTE are comparing it to the libraries listed below
Sorting:
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆79Updated 4 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆69Updated last year
- ☆57Updated last year
- ☆52Updated 6 months ago
- ☆24Updated last month
- ☆46Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆61Updated 2 months ago
- ☆30Updated last month
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆19Updated 6 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆39Updated last year
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆105Updated 11 months ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆88Updated 2 years ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆39Updated 2 months ago
- ☆42Updated 2 years ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆39Updated last month
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆123Updated 4 months ago
- ☆18Updated 3 months ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆13Updated 2 months ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆93Updated 6 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆47Updated 7 months ago
- ☆54Updated last year
- Accommodating Large Language Model Training over Heterogeneous Environment.☆24Updated 3 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆40Updated 6 months ago
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆20Updated 2 months ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆16Updated 8 months ago
- An experimentation platform for LLM inference optimisation☆31Updated 9 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆71Updated 8 months ago
- A curated list of early exiting (LLM, CV, NLP, etc)☆55Updated 10 months ago
- ☆42Updated 7 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆108Updated 2 months ago