google-ai-edge / ai-edge-quantizerLinks
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆68Updated last week
Alternatives and similar repositories for ai-edge-quantizer
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆97Updated 10 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆296Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆80Updated last week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆35Updated last month
- ONNX and TensorRT implementation of Whisper☆64Updated 2 years ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆400Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆797Updated this week
- Visualize ONNX models with model-explorer☆45Updated this week
- 🤗 Optimum ExecuTorch☆67Updated this week
- A Toolkit to Help Optimize Onnx Model☆220Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆418Updated this week
- Use safetensors with ONNX 🤗☆69Updated last week
- ONNX implementation of Whisper. PyTorch free.☆99Updated 10 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆45Updated last month
- High-Performance SGEMM on CUDA devices☆105Updated 8 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆123Updated 3 weeks ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆83Updated last week
- Common utilities for ONNX converters☆282Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆164Updated last week
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆53Updated 2 weeks ago
- LLM training in simple, raw C/CUDA☆105Updated last year
- A block oriented training approach for inference time optimization.☆34Updated last year
- Mobile App Open☆63Updated this week
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆90Updated 11 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆67Updated 2 months ago
- python package of rocm-smi-lib☆24Updated 2 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- Quantized LLM training in pure CUDA/C++.☆180Updated this week