onnx / neural-compressorLinks
Model compression for ONNX
☆97Updated 10 months ago
Alternatives and similar repositories for neural-compressor
Users that are interested in neural-compressor are comparing it to the libraries listed below
Sorting:
- A Toolkit to Help Optimize Onnx Model☆220Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆69Updated this week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆35Updated last month
- Use safetensors with ONNX 🤗☆69Updated last week
- A Toolkit to Help Optimize Large Onnx Model☆160Updated last year
- Common utilities for ONNX converters☆282Updated last month
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆53Updated 2 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆296Updated last year
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆61Updated 5 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆400Updated last week
- ONNX Command-Line Toolbox☆35Updated 11 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆41Updated 2 years ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆36Updated this week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆80Updated this week
- The Triton backend for TensorRT.☆78Updated last month
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- The Triton backend for the ONNX Runtime.☆162Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆180Updated 6 months ago
- Accelerate PyTorch models with ONNX Runtime☆364Updated 7 months ago
- Visualize ONNX models with model-explorer☆45Updated this week
- ☆69Updated 2 years ago
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 3 years ago
- Scailable ONNX python tools☆97Updated 11 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆181Updated last month
- Mobile App Open☆63Updated this week
- Count number of parameters / MACs / FLOPS for ONNX models.☆94Updated 11 months ago
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆297Updated last year
- A code generator from ONNX to PyTorch code☆141Updated 2 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆167Updated this week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆118Updated last month