onnx / neural-compressorLinks
Model compression for ONNX
☆98Updated last year
Alternatives and similar repositories for neural-compressor
Users that are interested in neural-compressor are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆99Updated this week
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated 2 months ago
- A Toolkit to Help Optimize Large Onnx Model☆163Updated 3 months ago
- Use safetensors with ONNX 🤗☆87Updated this week
- Common utilities for ONNX converters☆294Updated last month
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆48Updated last week
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆35Updated 3 weeks ago
- A tool convert TensorRT engine/plan to a fake onnx☆42Updated 3 years ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆65Updated 9 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated this week
- ONNX Command-Line Toolbox☆35Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆420Updated last week
- The Triton backend for the ONNX Runtime.☆173Updated this week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆28Updated 2 years ago
- Visualize ONNX models with model-explorer☆67Updated last month
- The Triton backend for TensorRT.☆85Updated last week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆85Updated this week
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 3 years ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆306Updated last year
- A Toolkit to Help Optimize Onnx Model☆409Updated this week
- Count number of parameters / MACs / FLOPS for ONNX models.☆95Updated last year
- Efficient in-memory representation for ONNX, in Python☆42Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆184Updated 10 months ago
- Large Language Model Onnx Inference Framework☆36Updated 2 months ago
- llm deploy project based onnx.☆49Updated last year
- Scailable ONNX python tools☆98Updated last year
- ☆125Updated 2 years ago
- A simple Python tool to measure the performance of ONNX models.☆27Updated last year
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆303Updated last year