google-ai-edge / ai-edge-quantizerLinks

AI Edge Quantizer: flexible post training quantization for LiteRT models.

☆56

Alternatives and similar repositories for ai-edge-quantizer

Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below

Sorting:

onnx / neural-compressor
Model compression for ONNX
☆97Updated 8 months ago
google-ai-edge / ai-edge-torch
Supporting PyTorch models with the Google AI Edge TFLite runtime.
☆732Updated this week
justinchuby / model-explorer-onnx
Visualize ONNX models with model-explorer
☆39Updated 2 months ago
quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…
☆74Updated this week
PINTO0309 / tflite2json2tflite
Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.
☆27Updated 2 years ago
SonySemiconductorSolutions / mct-model-optimization
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…
☆409Updated 3 weeks ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆289Updated last year
inisis / OnnxSlim
A Toolkit to Help Optimize Onnx Model
☆188Updated last week
PINTO0309 / whisper-onnx-tensorrt
ONNX and TensorRT implementation of Whisper
☆64Updated 2 years ago
sdpython / onnx-extended
New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA
☆33Updated this week
ARM-software / kleidiai
This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai
☆62Updated 2 weeks ago
PINTO0309 / sne4onnx
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…
☆17Updated last year
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆369Updated this week
fastmachinelearning / qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
☆151Updated last week
huggingface / optimum-executorch
🤗 Optimum ExecuTorch
☆58Updated this week
PINTO0309 / whisper-onnx-cpu
ONNX implementation of Whisper. PyTorch free.
☆101Updated 8 months ago
google-ai-edge / LiteRT
LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Now with LiteRT Next, we're exp…
☆688Updated this week
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆157Updated last year
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
saic-fi / MobileQuant
[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
☆66Updated 10 months ago
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆276Updated 3 weeks ago
Qualcomm-AI-research / transformer-quantization
☆205Updated 3 years ago
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆69Updated last month
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated last week
pytorch-labs / superblock
A block oriented training approach for inference time optimization.
☆33Updated 11 months ago
PINTO0309 / simple-onnx-processing-tools
A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…
☆296Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago
quic / qidk
☆149Updated last month
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week