google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆56Updated last week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆97Updated 8 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆732Updated this week
- Visualize ONNX models with model-explorer☆39Updated 2 months ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆74Updated this week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆409Updated 3 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆289Updated last year
- A Toolkit to Help Optimize Onnx Model☆188Updated last week
- ONNX and TensorRT implementation of Whisper☆64Updated 2 years ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆33Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆62Updated 2 weeks ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆369Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆151Updated last week
- 🤗 Optimum ExecuTorch☆58Updated this week
- ONNX implementation of Whisper. PyTorch free.☆101Updated 8 months ago
- LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Now with LiteRT Next, we're exp…☆688Updated this week
- High-Performance SGEMM on CUDA devices☆98Updated 6 months ago
- A Toolkit to Help Optimize Large Onnx Model☆157Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆66Updated 10 months ago
- Common utilities for ONNX converters☆276Updated 3 weeks ago
- ☆205Updated 3 years ago
- Use safetensors with ONNX 🤗☆69Updated last month
- Fast low-bit matmul kernels in Triton☆338Updated last week
- A block oriented training approach for inference time optimization.☆33Updated 11 months ago
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆296Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 9 months ago
- ☆149Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆142Updated this week