NoakLiu / LLMEasyQuantLinks
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
☆20Updated last week
Alternatives and similar repositories for LLMEasyQuant
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
Sorting:
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆33Updated last week
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Updated 4 months ago
- Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]☆23Updated 4 months ago
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆10Updated last month
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆93Updated 6 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆459Updated this week
- The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"☆46Updated 2 weeks ago
- Paper List of Inference/Test Time Scaling/Computing☆264Updated last week
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆18Updated last month
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆29Updated last month
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆135Updated last month
- Survey Paper List - Efficient LLM and Foundation Models☆249Updated 9 months ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆105Updated 11 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆110Updated 4 months ago
- The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆375Updated last week
- [arXiv 2025] Efficient Reasoning Models: A Survey☆184Updated this week
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆34Updated 3 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆92Updated last year
- ☆104Updated 3 weeks ago
- Code release for VTW (AAAI 2025) Oral☆43Updated 5 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆53Updated 10 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆323Updated 3 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆166Updated this week
- DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.com/p/1218643…☆21Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆297Updated 7 months ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆161Updated 8 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆281Updated 2 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆22Updated 2 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆62Updated 3 months ago
- Paper list for Efficient Reasoning.☆509Updated this week