google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆49Updated this week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆51Updated this week
- Model compression for ONNX☆96Updated 7 months ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- Explore training for quantized models☆18Updated this week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆63Updated 9 months ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆150Updated this week
- Visualize ONNX models with model-explorer☆36Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 3 months ago
- Use safetensors with ONNX 🤗☆63Updated 3 months ago
- ☆69Updated 2 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆94Updated 6 years ago
- Fast low-bit matmul kernels in Triton☆322Updated last week
- ☆149Updated 2 years ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 3 months ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆31Updated this week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆72Updated last week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- 🤗 Optimum ExecuTorch☆53Updated this week
- Count number of parameters / MACs / FLOPS for ONNX models.☆93Updated 8 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- llama INT4 cuda inference with AWQ☆54Updated 5 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆678Updated this week
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆80Updated 3 weeks ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated last month
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆110Updated 6 months ago
- ONNX and TensorRT implementation of Whisper☆63Updated 2 years ago
- A curated list of awesome inference deployment framework of artificial intelligence (AI) models. OpenVINO, TensorRT, MediaPipe, TensorFlo…☆62Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- A Toolkit to Help Optimize Onnx Model☆159Updated this week
- ONNX implementation of Whisper. PyTorch free.☆99Updated 7 months ago