google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆88Updated this week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆99Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆305Updated last year
- A Toolkit to Help Optimize Onnx Model☆308Updated last week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆85Updated this week
- Mobile App Open☆67Updated this week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆28Updated 2 years ago
- ONNX and TensorRT implementation of Whisper☆66Updated 2 years ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆903Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆418Updated this week
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆46Updated this week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated 3 months ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆97Updated last year
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆431Updated this week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆35Updated last month
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆64Updated 8 months ago
- 🤗 Optimum ExecuTorch☆101Updated last week
- ONNX implementation of Whisper. PyTorch free.☆102Updated last year
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Updated last month
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆111Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆170Updated last week
- Visualize ONNX models with model-explorer☆66Updated last week
- Use safetensors with ONNX 🤗☆81Updated last week
- Efficient in-memory representation for ONNX, in Python☆41Updated this week
- A Toolkit to Help Optimize Large Onnx Model☆163Updated 2 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆130Updated last month
- Common utilities for ONNX converters☆291Updated last month
- The Triton backend for the ONNX Runtime.☆171Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆233Updated this week
- ☆171Updated 2 weeks ago