google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆41Updated last week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Visualize ONNX models with model-explorer☆34Updated 2 weeks ago
- Model compression for ONNX☆96Updated 6 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆43Updated last week
- Explore training for quantized models☆18Updated this week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- ONNX and TensorRT implementation of Whisper☆63Updated 2 years ago
- Common utilities for ONNX converters☆270Updated 6 months ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆149Updated last week
- ☆21Updated 3 weeks ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- A Toolkit to Help Optimize Large Onnx Model☆158Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 2 months ago
- A Toolkit to Help Optimize Onnx Model☆153Updated this week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 2 months ago
- TFLite model analyzer & memory optimizer☆127Updated last year
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆63Updated 8 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆356Updated this week
- Inference of quantization aware trained networks using TensorRT☆81Updated 2 years ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆68Updated this week
- A code generator from ONNX to PyTorch code☆138Updated 2 years ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 3 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆620Updated this week
- The Triton backend for the ONNX Runtime.☆148Updated 3 weeks ago
- ☆69Updated 2 years ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆110Updated 3 weeks ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆64Updated last week
- Notes and artifacts from the ONNX steering committee☆26Updated this week
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆110Updated 6 months ago
- ☆149Updated 2 years ago
- Fast low-bit matmul kernels in Triton☆311Updated this week