quic / aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
☆2,148Updated this week
Related projects ⓘ
Alternatives and complementary repositories for aimet
- ☆302Updated 11 months ago
- [ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment☆1,884Updated 11 months ago
- Simplify your onnx model☆3,865Updated 2 months ago
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,597Updated this week
- Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distille…☆4,351Updated last year
- A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are co…☆1,886Updated 2 weeks ago
- A curated list of neural network pruning resources.☆2,361Updated 7 months ago
- ONNX-TensorRT: TensorRT backend for ONNX☆2,953Updated 2 weeks ago
- Model Quantization Benchmark☆765Updated 5 months ago
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆943Updated this week
- Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators☆1,528Updated 5 years ago
- Bolt is a deep learning library with high performance and heterogeneous flexibility.☆918Updated 3 months ago
- Actively maintained ONNX Optimizer☆647Updated 8 months ago
- Tensorflow Backend for ONNX☆1,284Updated 7 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,979Updated this week
- High-efficiency floating-point neural network inference operators for mobile, server, and Web☆1,885Updated this week
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,558Updated 7 months ago
- A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.☆1,493Updated this week
- Flops counter for convolutional networks in pytorch framework☆2,822Updated last month
- Rethinking the Value of Network Pruning (Pytorch) (ICLR 2019)☆1,510Updated 4 years ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,227Updated this week
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,219Updated 3 years ago
- An easy to use PyTorch to TensorRT converter☆4,612Updated 3 months ago
- Awesome machine learning model compression research papers, quantization, tools, and learning material.☆491Updated last month
- [ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware☆1,425Updated 2 months ago
- TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.☆754Updated 3 weeks ago
- Serve, optimize and scale PyTorch models in production☆4,218Updated 3 weeks ago
- PyTorch library to facilitate development and standardized evaluation of neural network pruning methods.☆424Updated last year
- Collection of recent methods on (deep) neural network compression and acceleration.☆930Updated 2 months ago
- A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.☆1,346Updated 2 weeks ago