taishan1994/LLM-Quantization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/taishan1994/LLM-Quantization)

taishan1994 / LLM-Quantization

记录量化LLM中的总结。

☆79

Alternatives and similar repositories for LLM-Quantization

Users that are interested in LLM-Quantization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shijiew / QwenSpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆15Mar 20, 2025Updated last year
DataXujing / YOLOv12-TensorRT
View on GitHub
YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现
☆14Mar 5, 2025Updated last year
StiphyJay / MQuant
View on GitHub
[ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
☆44Aug 13, 2025Updated 11 months ago
AI-Efficiency / Qwen3-Quantization-Toolkit
View on GitHub
☆79Sep 19, 2025Updated 10 months ago
ebby-s / MX-for-FPGA
View on GitHub
Implementation of Microscaling data formats in SystemVerilog.
☆34Jul 6, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
a514154639 / opencv-ffmpeg-nvjmi-mpp
View on GitHub
opencv调用jetson/rk3588 mpp硬解码，重写了open与read函数，支持h264/h265
☆14Nov 27, 2025Updated 8 months ago
kssteven418 / SqueezeLLM-gradients
View on GitHub
☆21Feb 5, 2024Updated 2 years ago
cqu20160901 / DETR_onnx_tensorRT_V2
View on GitHub
DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。
☆12Jan 9, 2024Updated 2 years ago
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
ModelCloud / GPTQModel
View on GitHub
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM…
☆1,216Updated this week
richjjj / cuvid-tensorrt-multi
View on GitHub
ffmpeg+cuvid+tensorrt+multicamera
☆12Dec 31, 2024Updated last year
DataXujing / Bert_TensorRT
View on GitHub
Bert TensorRT模型加速部署
☆10Apr 1, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ModelTC / LightCompress
View on GitHub
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
☆736May 14, 2026Updated 2 months ago
anminliu / VecAttention
View on GitHub
[CVPR2026] VecAttention: Vector-wise Sparse Attention for Accelerating Long-Context Inference
☆20May 27, 2026Updated 2 months ago
ShaoqiangLu / DFVG
View on GitHub
DFVG: A Heterogeneous Architecture for Speculative Decoding with Draft-on-FPGA and Verify-on-GPU.
☆25Nov 26, 2025Updated 8 months ago
cqu20160901 / FastSAM_rknn_Cplusplus
View on GitHub
FastSAM 部署rknn C++ 代码
☆13May 30, 2024Updated 2 years ago
GoatWu / APHQ-ViT
View on GitHub
[CVPR 2025] APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
☆44Apr 7, 2025Updated last year
mz24cn / gemm_optimization
View on GitHub
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…
☆17Mar 28, 2019Updated 7 years ago
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆524Nov 26, 2024Updated last year
morsoli / llmbenchmark
View on GitHub
大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标
☆20Sep 12, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
BaofengZan / my_trt_pro
View on GitHub
跟着Tensorrt_pro学习各种知识
☆39Nov 25, 2022Updated 3 years ago
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆93Jul 28, 2025Updated last year
YanjingLi0202 / Bi-ViT
View on GitHub
The official implementation of the AAAI 2024 paper Bi-ViT.
☆13Dec 18, 2023Updated 2 years ago
TRT2022 / ControlNet_TensorRT
View on GitHub
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆50Aug 16, 2023Updated 2 years ago
wangzhaode / onnx-llm
View on GitHub
llm deploy project based onnx.
☆49Oct 9, 2024Updated last year
Phoenix8215 / learn-TensorRT-from-scratch
View on GitHub
learn TensorRT from scratch🥰
☆18Sep 29, 2024Updated last year
Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
richjjj / duscratch
View on GitHub
搜藏的希望的代码片段
☆13Jun 6, 2023Updated 3 years ago
triple-mu / HunyuanDiT-TensorRT-libtorch
View on GitHub
HunyuanDiT with TensorRT and libtorch
☆18May 22, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
pprp / Awesome-LLM-Quantization
View on GitHub
Awesome list for LLM quantization
☆436Apr 20, 2026Updated 3 months ago
mailliw2010 / infer-frame
View on GitHub
a ai infra framework for edge device base on nndeploy
☆18Nov 27, 2025Updated 8 months ago
nobbyfix / AzurLaneSourceJson
View on GitHub
azurlane lua files but in json
☆11Sep 23, 2021Updated 4 years ago
Ranking666 / Base-quantization
View on GitHub
base quantization methods including: QAT, PTQ, per_channel, per_tensor, dorefa, lsq, adaround, omse, Histogram, bias_correction.etc
☆52Nov 2, 2022Updated 3 years ago
BaofengZan / mnn-llm-GOT-OCR2.0
View on GitHub
使用mnn-llm对GOT-OCR2.0进行推理
☆14Oct 2, 2024Updated last year
yuny220 / NAR-Former
View on GitHub
Pytorch code of [CVPR 2023] "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction".
☆11Mar 14, 2023Updated 3 years ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago