ZongwuWang/MILLION

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZongwuWang/MILLION)

ZongwuWang / MILLION

This repository presents the source code for the paper "MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization" (DAC'25).

☆25

Alternatives and similar repositories for MILLION

Users that are interested in MILLION are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yc2367 / BBS-MICRO
View on GitHub
☆19Nov 11, 2024Updated last year
CMU-SAFARI / transpimlib
View on GitHub
TransPimLib is a library for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, TransPimLib provides …
☆16Apr 21, 2023Updated 3 years ago
abhibambhaniya / progressive_gradient_flow_nm_sparsity
View on GitHub
Implementation of NM sparsity recipe presented in the paper "Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers".
☆11Feb 5, 2024Updated 2 years ago
imagination-research / EEP
View on GitHub
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆25Nov 11, 2025Updated 8 months ago
RuokaiYin / LoAS
View on GitHub
LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks, MICRO 2024.
☆19Mar 19, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Hazuyuki / PIM-HLS
View on GitHub
☆12Aug 18, 2023Updated 2 years ago
cds-ruc / PIM-ANNS
View on GitHub
☆19Jul 3, 2026Updated 2 weeks ago
IST-DASLab / gemm-int8
View on GitHub
High Performance Int8 GEMM Kernels for SM80 and later GPUs.
☆23Mar 11, 2025Updated last year
leesou / PIM-DL-ASPLOS
View on GitHub
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆37Feb 21, 2024Updated 2 years ago
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
UMass-Embodied-AGI / CommVQ
View on GitHub
[ICML 2025] CommVQ: Commutative Vector Quantization for KV Cache Compression
☆27Sep 2, 2025Updated 10 months ago
VITA-Group / R-Sparse
View on GitHub
[ICLR'25] R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
☆21Apr 28, 2025Updated last year
kelvin0207 / SparSynergy
View on GitHub
Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…
☆26Mar 29, 2025Updated last year
MOSS2023ASE / service-frontend
View on GitHub
☆11Jun 11, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MOSS2023ASE / service-backend
View on GitHub
☆10Jun 10, 2023Updated 3 years ago
Kimho666 / LLM_Hardware_Survey
View on GitHub
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
☆18Jul 15, 2025Updated last year
mohamed / roofline
View on GitHub
A simple script to plot the Roofline model for given HW platforms and applications
☆10Mar 17, 2026Updated 4 months ago
Thysrael / dotfiles
View on GitHub
Thysrael's naive dotfiles.
☆17Updated this week
ConvolutedDog / gpgpu-sim-comments
View on GitHub
GPGPU-Sim 中文注释版代码，包含 GPGPU-Sim 模拟器的最新版代码，经过中文注释，以帮助中文用户更好地理解和使用该模拟器。
☆30Dec 18, 2024Updated last year
snu-comparch / Tender
View on GitHub
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
☆34Jul 4, 2024Updated 2 years ago
DDMXIE / Influence-maximization-on-hypergraphs
View on GitHub
Influence-maximization-on-hypergraphs
☆14Jun 6, 2022Updated 4 years ago
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆12Oct 22, 2024Updated last year
YJHMITWEB / ExFlow
View on GitHub
Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MXHX7199 / ICCV_2021_AFP
View on GitHub
AFP is a hardware-friendly quantization framework for DNNs, which is contributed by Fangxin Liu and Wenbo Zhao.
☆13Nov 8, 2021Updated 4 years ago
clevercool / ANT-Quantization
View on GitHub
☆123Nov 17, 2023Updated 2 years ago
gem5 / m5threads
View on GitHub
Light weight threading library for gem5 syscall emulator (git mirror)
☆16Mar 1, 2017Updated 9 years ago
yc2367 / P3-LLM
View on GitHub
☆23Apr 3, 2026Updated 3 months ago
SNU-ARC / DecDEC
View on GitHub
[OSDI 2025] DecDEC: A Systems Approach to Advancing Low‑Bit LLM Quantization
☆26Jan 29, 2026Updated 5 months ago
BICLab / MetaLA
View on GitHub
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)
☆36Jan 18, 2025Updated last year
insuhan / calibquant
View on GitHub
☆21Apr 3, 2025Updated last year
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
ROCm / tensorcast
View on GitHub
☆18Nov 10, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
scalesim-project / scale-sim-v3
View on GitHub
☆68Nov 29, 2025Updated 7 months ago
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
cjyaras / deep-lora-transformers
View on GitHub
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation (ICML'24 Oral)
☆12Jul 22, 2024Updated last year
dubcyfor3 / Prosperity
View on GitHub
The official implementation of HPCA 2025 paper, Prosperity: Accelerating Spiking Neural Networks via Product Sparsity
☆41Aug 9, 2025Updated 11 months ago
SJTU-ECTL / GOMIL
View on GitHub
GOMIL: Global Optimization of Multiplier by Integer Linear Programming
☆13Aug 25, 2021Updated 4 years ago
Chun-Feng / CACTI-6.0
View on GitHub
ASIC simulation of Multi-ported Memory Module. And it can offer SRAM-based dual-port basic building block to support multiple read/write …
☆25May 30, 2016Updated 10 years ago
IST-DASLab / DarwinLM
View on GitHub
Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"
☆20Feb 21, 2025Updated last year