NoakLiu / LLMEasyQuantLinks

A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]

☆26

Alternatives and similar repositories for LLMEasyQuant

Users that are interested in LLMEasyQuant are comparing it to the libraries listed below

Sorting:

NoakLiu / GraphSnapShot
GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]
☆40Updated 3 weeks ago
RUCKBReasoning / LLM-Streamline
Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"
☆30Updated 5 months ago
NoakLiu / DRTR
Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]
☆10Updated 8 months ago
fscdc / Awesome-Efficient-Reasoning-Models
[TMLR 2025] Efficient Reasoning Models: A Survey
☆271Updated this week
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆382Updated 3 weeks ago
wangqinsi1 / Dobi-SVD
Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
☆45Updated 2 weeks ago
ZHITENGLI / ARB-LLM
[ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models
☆27Updated 2 months ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆125Updated 2 months ago
kyegomez / SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…
☆125Updated 2 weeks ago
UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆258Updated last year
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆559Updated 2 weeks ago
2018cx / Multi-Level-OT
Pytorch Implementation of "Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models", AAAI 2…
☆37Updated 5 months ago
withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs
[TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
☆435Updated 2 months ago
NoakLiu / MT2ST
Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]
☆12Updated 4 months ago
Feng-Hong / DivBS
[ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"
☆10Updated last year
ModelTC / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆39Updated last year
ZO-Bench / ZO-LLM
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆111Updated 3 months ago
parsa-epfl / quantization-sparsity-interplay
This repo contains the code for studying the interplay between quantization and sparsity methods
☆23Updated 7 months ago
DavidFanzz / SCMoE
☆28Updated last year
zimingyy / SubZero
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces (ICCV 2025)
☆15Updated 10 months ago
NoakLiu / Awesome-Efficient-Foundation-Models-Design
Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]
☆25Updated 7 months ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆66Updated 6 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆156Updated 3 weeks ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆135Updated 3 months ago
Hsu1023 / DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆171Updated last year
mzf666 / LORO-main
Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'
☆12Updated 5 months ago
ThisisBillhe / ZipCache
[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆29Updated 6 months ago
mit-han-lab / sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
☆74Updated last year
aim-uofa / LoRAPrune
☆59Updated 10 months ago
Clin0212 / HydraLoRA
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
☆227Updated 10 months ago