google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆32Updated last week
Alternatives and similar repositories for ai-edge-quantizer:
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
- Model compression for ONNX☆91Updated 5 months ago
- Visualize ONNX models with model-explorer☆31Updated last month
- ☆21Updated last week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆33Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆145Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆543Updated last week
- ☆143Updated 2 years ago
- Use safetensors with ONNX 🤗☆54Updated last month
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated 11 months ago
- Explore training for quantized models☆17Updated 3 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆388Updated this week
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- Count number of parameters / MACs / FLOPS for ONNX models.☆91Updated 5 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆111Updated this week
- Common utilities for ONNX converters☆266Updated 4 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 11 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆108Updated 4 months ago
- ☆63Updated 5 months ago
- A Toolkit to Help Optimize Onnx Model☆140Updated this week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆62Updated this week
- Fast low-bit matmul kernels in Triton☆291Updated this week
- ☆222Updated 2 years ago
- ☆203Updated 3 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆34Updated 2 years ago
- Reference Kernels for the Leaderboard☆33Updated last week
- TFLite model analyzer & memory optimizer☆125Updated last year