google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆32Updated this week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Visualize ONNX models with model-explorer☆33Updated 2 months ago
- Model compression for ONNX☆92Updated 5 months ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆37Updated this week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated last year
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆569Updated this week
- Explore training for quantized models☆18Updated 4 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆65Updated last week
- A Toolkit to Help Optimize Large Onnx Model☆156Updated last year
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆148Updated this week
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆34Updated 2 years ago
- ONNX implementation of Whisper. PyTorch free.☆96Updated 5 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆64Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- Profile your CoreML models directly from Python 🐍☆27Updated 7 months ago
- A Toolkit to Help Optimize Onnx Model☆145Updated this week
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆60Updated 6 months ago
- cross-platform high speed inference SDK☆36Updated last week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated last month
- ☆21Updated this week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆394Updated last week
- A curated list of awesome inference deployment framework of artificial intelligence (AI) models. OpenVINO, TensorRT, MediaPipe, TensorFlo…☆59Updated last year
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆22Updated this week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆349Updated this week