google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆24Updated this week
Alternatives and similar repositories for ai-edge-quantizer:
Users that are interested in ai-edge-quantizer are comparing it to the libraries listed below
- Model compression for ONNX☆87Updated 3 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated 10 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated 10 months ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 5 months ago
- Explore training for quantized models☆16Updated 2 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week
- ☆25Updated last week
- edge/mobile transformer based Vision DNN inference benchmark☆15Updated 2 months ago
- Visualize ONNX models with model-explorer☆28Updated last week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆34Updated 2 years ago
- Experiments with BitNet inference on CPU☆53Updated 11 months ago
- Notes and artifacts from the ONNX steering committee☆25Updated last week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆138Updated 2 weeks ago
- The Riallto Open Source Project from AMD☆74Updated 4 months ago
- ☆19Updated last week
- ☆69Updated 2 years ago
- Nsight Systems In Docker☆20Updated last year
- ☆21Updated last week
- Use safetensors with ONNX 🤗☆48Updated last week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated last week
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 2 years ago
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last month
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated 3 weeks ago
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- TORCH_LOGS parser for PT2☆33Updated this week
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆24Updated 10 months ago