microsoft / OliveLinks
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
β1,982Updated this week
Alternatives and similar repositories for Olive
Users that are interested in Olive are comparing it to the libraries listed below
Sorting:
- Generative AI extensions for onnxruntimeβ749Updated this week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β2,967Updated this week
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ475Updated this week
- Examples for using ONNX Runtime for model training.β338Updated 8 months ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,443Updated last week
- A pytorch quantization backend for optimumβ959Updated 3 weeks ago
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ397Updated this week
- ONNXMLTools enables conversion of models to ONNXβ1,092Updated 3 weeks ago
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platformβ1,892Updated last week
- ONNX Optimizerβ726Updated last week
- β1,027Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β360Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,529Updated this week
- Examples for using ONNX Runtime for machine learning inferencing.β1,414Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,137Updated last year
- DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for commβ¦β2,478Updated 2 weeks ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,654Updated 3 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,026Updated last week
- Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNXβ2,441Updated this week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,166Updated 9 months ago
- Common utilities for ONNX convertersβ273Updated last week
- High-efficiency floating-point neural network inference operators for mobile, server, and Webβ2,054Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,202Updated last month
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. β¦β1,028Updated last week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β188Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructureβ881Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,438Updated 11 months ago
- A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.β1,524Updated 4 months ago
- A machine learning compiler for GPUs, CPUs, and ML acceleratorsβ3,327Updated this week
- Simplify your onnx modelβ4,110Updated 10 months ago