onnx / neural-compressorLinks
Model compression for ONNX
β98Updated last year
Alternatives and similar repositories for neural-compressor
Users that are interested in neural-compressor are comparing it to the libraries listed below
Sorting:
- A Toolkit to Help Optimize Large Onnx Modelβ163Updated 3 months ago
- Use safetensors with ONNX π€β87Updated this week
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!β54Updated 2 months ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDAβ35Updated 3 weeks ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β96Updated last week
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).β48Updated last week
- A tool convert TensorRT engine/plan to a fake onnxβ42Updated 3 years ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.β28Updated 2 years ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIPβ65Updated 9 months ago
- Common utilities for ONNX convertersβ294Updated last month
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ306Updated last year
- The Triton backend for the ONNX Runtime.β173Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β420Updated last week
- The Triton backend for TensorRT.β85Updated last week
- Visualize ONNX models with model-explorerβ67Updated last month
- ONNX Command-Line Toolboxβ35Updated last year
- A Toolkit to Help Optimize Onnx Modelβ409Updated this week
- Count number of parameters / MACs / FLOPS for ONNX models.β95Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ244Updated this week
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change opβ¦β303Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iβ¦β182Updated last month
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by seβ¦β19Updated last year
- Scailable ONNX python toolsβ98Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transforβ¦β85Updated last week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interfaceβ139Updated last month
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.β52Updated 3 years ago
- β70Updated 2 years ago
- A block oriented training approach for inference time optimization.β34Updated last year
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Thβ¦β432Updated this week
- Efficient in-memory representation for ONNX, in Pythonβ42Updated this week