google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆81Updated 3 weeks ago
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆99Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆300Updated last year
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆28Updated 2 years ago
- A Toolkit to Help Optimize Onnx Model☆267Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆427Updated this week
- ONNX and TensorRT implementation of Whisper☆65Updated 2 years ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆84Updated this week
- 🤗 Optimum ExecuTorch☆88Updated this week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆35Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆412Updated this week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated 2 months ago
- ONNX implementation of Whisper. PyTorch free.☆102Updated last year
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆95Updated last year
- Mobile App Open☆64Updated this week
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆43Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆855Updated this week
- Use safetensors with ONNX 🤗☆76Updated 2 months ago
- Visualize ONNX models with model-explorer☆64Updated last month
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated 3 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆214Updated this week
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆63Updated 7 months ago
- 👷 Build compute kernels☆192Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆104Updated this week
- Step by step implementation of a fast softmax kernel in CUDA☆58Updated 11 months ago
- A block oriented training approach for inference time optimization.☆33Updated last year
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆93Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆166Updated this week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆247Updated last month
- Implementation of a methodology that allows all sorts of user defined GPU kernel fusion, for non CUDA programmers.☆32Updated this week