megvii-research / SparsebitLinks

A model compression and acceleration toolbox based on pytorch.

☆332

Alternatives and similar repositories for Sparsebit

Users that are interested in Sparsebit are comparing it to the libraries listed below

Sorting:

megvii-research / FQ-ViT
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
☆355Updated 2 years ago
ModelTC / Dipoorlet
Offline Quantization Tools for Deploy.
☆141Updated last year
Qualcomm-AI-research / transformer-quantization
☆207Updated 4 years ago
tianyic / only_train_once_personal_footprint
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆309Updated last year
ModelTC / MQBench
Model Quantization Benchmark
☆849Updated 7 months ago
MegEngine / mgeconvert
MegEngine到其他框架的转换器
☆70Updated 2 years ago
aojunzz / NM-sparsity
☆243Updated 3 years ago
yhhhli / BRECQ
Pytorch implementation of BRECQ, ICLR 2021
☆285Updated 4 years ago
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
TRT2022 / MST-plus-plus-TensorRT
TensorRT 2022复赛方案：首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化
☆143Updated 3 years ago
deepglint / EasyQuant
EasyQuant(EQ) is an efficient and simple post-training quantization method via effectively optimizing the scales of weights and activatio…
☆404Updated 3 years ago
ucbrise / actnn
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
☆199Updated 2 years ago
BlinkDL / RWKV-CUDA
The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )
☆224Updated 11 months ago
hahnyuan / RPTQ4LLM
Reorder-based post-training quantization for large language model
☆197Updated 2 years ago
facebookresearch / LLM-QAT
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆320Updated 8 months ago
ThanatosShinji / onnx-tool
A parser, editor and profiler tool for ONNX models.
☆465Updated 2 weeks ago
nbasyl / LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
☆217Updated last year
IST-DASLab / OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆129Updated 2 years ago
Guangxuan-Xiao / torch-int
This repository contains integer operators on GPUs for PyTorch.
☆223Updated 2 years ago
OpenPPL / ppl.nn.llm
☆139Updated last year
hahnyuan / PTQ4ViT
Post-Training Quantization for Vision transformers.
☆234Updated 3 years ago
Jermmy / pytorch-quantization-demo
A simple network quantization demo using pytorch from scratch.
☆539Updated 2 years ago
MegEngine / examples
A set of examples around MegEngine
☆31Updated last year
Qualcomm-AI-research / FP8-quantization
☆165Updated 2 years ago
Tencent / TPAT
TensorRT Plugin Autogen Tool
☆368Updated 2 years ago
Zhen-Dong / HAWQ
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
☆451Updated 2 years ago
ModelTC / LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆622Updated last week
haochengxi / Train_Transformers_with_INT4
☆157Updated 2 years ago
ModelTC / NART
NART = NART is not A RunTime, a deep learning inference framework.
☆37Updated 2 years ago
ModelTC / mqbench-paper
☆44Updated 4 years ago