onnx / neural-compressor
Model compression for ONNX
☆87Updated 4 months ago
Alternatives and similar repositories for neural-compressor:
Users that are interested in neural-compressor are comparing it to the libraries listed below
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated this week
- The Triton backend for the ONNX Runtime.☆140Updated last week
- A Toolkit to Help Optimize Onnx Model☆124Updated this week
- Common utilities for ONNX converters☆259Updated 3 months ago
- A Toolkit to Help Optimize Large Onnx Model☆153Updated 10 months ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆89Updated 4 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆50Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆323Updated this week
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- The Triton backend for TensorRT.☆70Updated last week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- Scailable ONNX python tools☆97Updated 4 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 10 months ago
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 2 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- ☆31Updated 2 years ago
- ☆157Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Updated 3 months ago
- Accelerate PyTorch models with ONNX Runtime☆358Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 5 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆262Updated 11 months ago
- Use safetensors with ONNX 🤗☆51Updated 2 weeks ago
- ONNX Command-Line Toolbox☆35Updated 5 months ago
- Fast low-bit matmul kernels in Triton☆263Updated this week
- Implementation of YOLOv9 QAT optimized for deployment on TensorRT platforms.☆102Updated 3 weeks ago
- ☆141Updated 2 years ago
- OpenVINO backend for Triton.☆31Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆27Updated this week
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated 3 weeks ago
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆290Updated 10 months ago