quic / aimetLinks

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

☆2,383

Alternatives and similar repositories for aimet

Users that are interested in aimet are comparing it to the libraries listed below

Sorting:

openvinotoolkit / nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
☆1,066Updated this week
daquexian / onnx-simplifier
Simplify your onnx model
☆4,121Updated 10 months ago
pytorch / TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
☆2,818Updated this week
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…
☆2,461Updated last week
quic / aimet-model-zoo
☆332Updated last year
onnx / optimizer
ONNX Optimizer
☆735Updated 2 weeks ago
Efficient-ML / Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are co…
☆2,168Updated 4 months ago
ZhangGe6 / onnx-modifier
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
☆1,536Updated 5 months ago
onnx / onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
☆3,124Updated last week
onnx / onnx-tensorflow
Tensorflow Backend for ONNX
☆1,312Updated last year
mit-han-lab / once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
☆1,927Updated last year
alibaba / TinyNeuralNetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
☆838Updated 2 months ago
ModelTC / MQBench
Model Quantization Benchmark
☆826Updated 3 months ago
NVIDIA / TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …
☆1,078Updated 2 weeks ago
mlcommons / inference
Reference implementations of MLPerf™ inference benchmarks
☆1,420Updated last week
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,587Updated this week
ThanatosShinji / onnx-tool
A parser, editor and profiler tool for ONNX models.
☆446Updated last month
ENOT-AutoDL / onnx2torch
Convert ONNX models to PyTorch.
☆691Updated 11 months ago
onnx / tensorflow-onnx
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
☆2,455Updated 2 weeks ago
OpenPPL / ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
☆1,714Updated last year
Xilinx / brevitas
Brevitas: neural network quantization in PyTorch
☆1,362Updated this week
IntelLabs / distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distille…
☆4,400Updated 2 years ago
huawei-noah / bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
☆953Updated 3 months ago
PINTO0309 / onnx2tf
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massiv…
☆833Updated last week
ARM-software / armnn
Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
☆1,275Updated 3 weeks ago
Talmaj / onnx2pytorch
Transform ONNX model to PyTorch representation
☆338Updated 8 months ago
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,261Updated last year
mit-han-lab / smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,461Updated last year
google / XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
☆2,072Updated last week
he-y / Awesome-Pruning
A curated list of neural network pruning resources.
☆2,468Updated last year