onnx / neural-compressorLinks
Model compression for ONNX
☆96Updated 7 months ago
Alternatives and similar repositories for neural-compressor
Users that are interested in neural-compressor are comparing it to the libraries listed below
Sorting:
- Common utilities for ONNX converters☆272Updated 6 months ago
- A Toolkit to Help Optimize Onnx Model☆161Updated this week
- The Triton backend for TensorRT.☆77Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆49Updated this week
- A Toolkit to Help Optimize Large Onnx Model☆157Updated last year
- The Triton backend for the ONNX Runtime.☆153Updated last week
- Use safetensors with ONNX 🤗☆63Updated 3 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆288Updated last year
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆360Updated this week
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆51Updated last week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 3 months ago
- ONNX Command-Line Toolbox☆35Updated 8 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆200Updated last year
- Scailable ONNX python tools☆96Updated 8 months ago
- Inference of quantization aware trained networks using TensorRT☆82Updated 2 years ago
- ☆69Updated 2 years ago
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 2 years ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆129Updated this week
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆296Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆110Updated 9 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆93Updated 8 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated 2 weeks ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆150Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆51Updated last week
- Accelerate PyTorch models with ONNX Runtime☆362Updated 4 months ago