onnx / neural-compressor
Model compression for ONNX
☆92Updated 6 months ago
Alternatives and similar repositories for neural-compressor
Users that are interested in neural-compressor are comparing it to the libraries listed below
Sorting:
- A Toolkit to Help Optimize Onnx Model☆145Updated this week
- Common utilities for ONNX converters☆269Updated 5 months ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 2 months ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- Use safetensors with ONNX 🤗☆58Updated 2 months ago
- A Toolkit to Help Optimize Large Onnx Model☆156Updated last year
- ONNX Command-Line Toolbox☆35Updated 7 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated last year
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- The Triton backend for the ONNX Runtime.☆145Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆32Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 8 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- Fast low-bit matmul kernels in Triton☆301Updated this week
- Visualize ONNX models with model-explorer☆33Updated 2 months ago
- The Triton backend for TensorRT.☆75Updated this week
- Count number of parameters / MACs / FLOPS for ONNX models.☆92Updated 6 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆351Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆112Updated this week
- Scailable ONNX python tools☆97Updated 6 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆51Updated this week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆65Updated this week
- ☆28Updated 3 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆92Updated 6 years ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆170Updated last month
- ☆158Updated last year
- ☆69Updated last month
- ☆69Updated 2 years ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆22Updated this week
- Accelerate PyTorch models with ONNX Runtime☆360Updated 2 months ago