onnx / neural-compressor
Model compression for ONNX
☆81Updated 2 months ago
Alternatives and similar repositories for neural-compressor:
Users that are interested in neural-compressor are comparing it to the libraries listed below
- The Triton backend for the ONNX Runtime.☆136Updated this week
- The Triton backend for TensorRT.☆68Updated 2 weeks ago
- A Toolkit to Help Optimize Onnx Model☆106Updated this week
- Common utilities for ONNX converters☆257Updated last month
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆313Updated this week
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆15Updated 8 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆37Updated 2 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆99Updated 4 months ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆31Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆257Updated 3 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆66Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆155Updated this week
- ☆58Updated 8 months ago
- A Toolkit to Help Optimize Large Onnx Model☆153Updated 8 months ago
- Scailable ONNX python tools☆96Updated 3 months ago
- ONNX Command-Line Toolbox☆35Updated 3 months ago
- The Triton backend for the PyTorch TorchScript models.☆141Updated last week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆57Updated this week
- ☆18Updated last week
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆45Updated last year
- Home for OctoML PyTorch Profiler☆107Updated last year
- ☆69Updated last year
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆48Updated last week
- Count number of parameters / MACs / FLOPS for ONNX models.☆90Updated 3 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆20Updated 10 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆188Updated 7 months ago
- Exports the ONNX file to a JSON file and JSON dict.☆33Updated 2 years ago
- A code generator from ONNX to PyTorch code☆135Updated 2 years ago