onnx / neural-compressorLinks

Model compression for ONNX

☆97

Alternatives and similar repositories for neural-compressor

Users that are interested in neural-compressor are comparing it to the libraries listed below

Sorting:

google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆73Updated this week
Libraries-Openly-Fused / cvGPUSpeedup
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆53Updated this week
inisis / OnnxSlim
A Toolkit to Help Optimize Onnx Model
☆228Updated this week
sdpython / onnx-extended
New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA
☆35Updated last week
meta-pytorch / tokenizers
C++ implementations for various tokenizers (sentencepiece, tiktoken etc).
☆39Updated this week
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆161Updated last year
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆73Updated 3 weeks ago
zhenhuaw-me / onnxcli
ONNX Command-Line Toolbox
☆35Updated last year
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆283Updated last month
triple-Mu / TensorRT2ONNX
A tool convert TensorRT engine/plan to a fake onnx
☆41Updated 2 years ago
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆404Updated last week
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆295Updated last year
PINTO0309 / sne4onnx
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…
☆17Updated 3 weeks ago
quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…
☆82Updated last week
justinchuby / model-explorer-onnx
Visualize ONNX models with model-explorer
☆62Updated 2 weeks ago
PINTO0309 / tflite2json2tflite
Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.
☆27Updated 2 years ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆163Updated 2 weeks ago
triton-inference-server / tensorrt_backend
The Triton backend for TensorRT.
☆79Updated 2 weeks ago
scailable / sclblonnx
Scailable ONNX python tools
☆97Updated last year
dusty-nv / NanoDB
Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP
☆61Updated 5 months ago
PINTO0309 / scs4onnx
A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.
☆52Updated 3 years ago
triton-inference-server / developer_tools
☆21Updated 2 weeks ago
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆180Updated 6 months ago
PINTO0309 / spo4onnx
Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…
☆19Updated last year
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆183Updated this week
gmalivenko / onnx-opcounter
Count number of parameters / MACs / FLOPS for ONNX models.
☆94Updated last year
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated 9 months ago
Oneflow-Inc / OneFlow-Pruning
[CVPR-2023] Towards Any Structural Pruning
☆16Updated 2 years ago
PINTO0309 / simple-onnx-processing-tools
A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…
☆298Updated last year
wangkuiyi / huggingface-tokenizer-in-cxx
☆69Updated 2 years ago