onnx / neural-compressor
Model compression for ONNX
☆91Updated 5 months ago
Alternatives and similar repositories for neural-compressor:
Users that are interested in neural-compressor are comparing it to the libraries listed below
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated last month
- A Toolkit to Help Optimize Large Onnx Model☆154Updated 11 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆49Updated this week
- A Toolkit to Help Optimize Onnx Model☆140Updated last week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated 2 years ago
- The Triton backend for the ONNX Runtime.☆142Updated this week
- ONNX Command-Line Toolbox☆35Updated 6 months ago
- Common utilities for ONNX converters☆267Updated 4 months ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆91Updated 6 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆32Updated this week
- Use safetensors with ONNX 🤗☆55Updated last month
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 6 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated 4 months ago
- Fast low-bit matmul kernels in Triton☆294Updated this week
- The Triton backend for TensorRT.☆73Updated this week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated 11 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆344Updated this week
- ☆69Updated 2 years ago
- A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible.☆52Updated 2 years ago
- Scailable ONNX python tools☆97Updated 6 months ago
- The Triton backend for the PyTorch TorchScript models.☆146Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆168Updated 3 weeks ago
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- Accelerate PyTorch models with ONNX Runtime☆359Updated 2 months ago
- ☆157Updated last year
- ☆124Updated last year
- ☆65Updated 2 years ago
- A block oriented training approach for inference time optimization.☆32Updated 8 months ago