thu-nics / qllm-eval
View external linksLinks

Code Repository of Evaluating Quantized Large Language Models

☆135

Alternatives and similar repositories for qllm-eval

Users that are interested in qllm-eval are comparing it to the libraries listed below

Sorting:

imagination-research / EEP
View on GitHub
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆23Nov 11, 2025Updated 3 months ago
thu-nics / MBQ
View on GitHub
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆75Mar 17, 2025Updated 10 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆211Nov 25, 2025Updated 2 months ago
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆482Nov 26, 2024Updated last year
ModelTC / LightCompress
View on GitHub
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
☆675Nov 19, 2025Updated 2 months ago
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆812Mar 6, 2025Updated 11 months ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆356Nov 20, 2025Updated 2 months ago
thu-nics / nicsefc-readme
View on GitHub
some docs for rookies in nics-efc
☆22Mar 17, 2022Updated 3 years ago
nunchaku-ai / deepcompressor
View on GitHub
Model Compression Toolbox for Large Language Models and Diffusion Models
☆753Aug 14, 2025Updated 6 months ago
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆50Oct 21, 2023Updated 2 years ago
thu-nics / CLAP-triangle-counting
View on GitHub
[DATE'23] The official code for paper <CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory>
☆23Jan 19, 2026Updated 3 weeks ago
ByteDance-Seed / decoupleQ
View on GitHub
A quantization algorithm for LLM
☆148Jun 21, 2024Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆51Aug 9, 2024Updated last year
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆180Oct 3, 2024Updated last year
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆29Mar 5, 2025Updated 11 months ago
thu-nics / MixDQ
View on GitHub
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆49Nov 27, 2024Updated last year
ChenMnZ / INT_vs_FP
View on GitHub
A framework to compare low-bit integer and float-point formats
☆66Feb 6, 2026Updated last week
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆220Dec 15, 2023Updated 2 years ago
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆37Sep 24, 2024Updated last year
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆37Oct 3, 2023Updated 2 years ago
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆336Jul 2, 2024Updated last year
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆123Jul 4, 2025Updated 7 months ago
walkerning / aw_nas
View on GitHub
aw_nas: A Modularized and Extensible NAS Framework
☆252Nov 25, 2025Updated 2 months ago
AboveParadise / LLMCBench
View on GitHub
☆28Dec 2, 2024Updated last year
sjtu-zhao-lab / SALO
View on GitHub
An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences
☆31Mar 7, 2024Updated last year
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆322Mar 4, 2025Updated 11 months ago
PingchengDong / GQA-LUT
View on GitHub
The official implementation of the DAC 2024 paper GQA-LUT
☆20Dec 20, 2024Updated last year
fuvty / DeSCo
View on GitHub
[WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>
☆23Mar 11, 2024Updated last year
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆37Aug 20, 2024Updated last year
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆68Mar 7, 2024Updated last year
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆81Jul 28, 2025Updated 6 months ago
IST-DASLab / QUIK
View on GitHub
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
☆184Apr 16, 2024Updated last year
machilusZ / FastGen
View on GitHub
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆44Aug 14, 2024Updated last year
thu-nics / FrameFusion
View on GitHub
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆69Jan 13, 2026Updated last month
thu-nics / R2R
View on GitHub
[NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…
☆78Updated this week
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆372Jul 10, 2025Updated 7 months ago
HuangOwen / Awesome-LLM-Compression
View on GitHub
Awesome LLM compression research papers and tools.
☆1,776Nov 10, 2025Updated 3 months ago
dubcyfor3 / Focus
View on GitHub
[HPCA 2026 Best Paper Candidate] Official implementation of "Focus: A Streaming Concentration Architecture for Efficient Vision-Language …
☆29Updated this week
UNITES-Lab / C2R-MoE
View on GitHub
[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…
☆14Feb 4, 2025Updated last year

thu-nics / qllm-evalView external linksLinks

Alternatives and similar repositories for qllm-eval

thu-nics / qllm-eval
View external linksLinks