NoakLiu / LLMEasyQuantLinks
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
☆26Updated 7 months ago
Alternatives and similar repositories for LLMEasyQuant
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
Sorting:
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆40Updated last month
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Updated last year
- [TMLR 2025] Efficient Reasoning Models: A Survey☆298Updated last week
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆50Updated 3 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆658Updated 4 months ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆482Updated 6 months ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆123Updated 7 months ago
- Code for paper "Markovian Scale Prediction: A New Era of Visual Autoregressive Generation".☆29Updated last month
- Zeroth-Order Fine-Tuning of LLMs in Random Subspaces (ICCV 2025)☆15Updated last year
- Awesome list for LLM pruning.☆282Updated 4 months ago
- Paper list for Efficient Reasoning.☆822Updated last week
- Survey Paper List - Efficient LLM and Foundation Models☆260Updated last year
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆12Updated 8 months ago
- A collection of AWESOME things about mixture-of-experts☆1,259Updated last year
- ☆36Updated 3 years ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆411Updated 11 months ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆156Updated 10 months ago
- Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs☆51Updated last week
- Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]☆28Updated 11 months ago
- [ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentation☆15Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆153Updated 7 months ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,117Updated 2 weeks ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Updated last year
- ☆28Updated last year
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆40Updated this week
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆46Updated last year
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆18Updated 8 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆824Updated this week
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆147Updated 6 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year