huggingface / optimum-intelLinks
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆532Updated this week
Alternatives and similar repositories for optimum-intel
Users that are interested in optimum-intel are comparing it to the libraries listed below
Sorting:
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆205Updated last week
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆327Updated 4 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆428Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆420Updated this week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆2,010Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆441Updated this week
- Common utilities for ONNX converters☆294Updated last month
- The Triton backend for the ONNX Runtime.☆172Updated this week
- Examples for using ONNX Runtime for model training.☆361Updated last year
- Generative AI extensions for onnxruntime☆953Updated this week
- Common source, scripts and utilities for creating Triton backends.☆366Updated this week
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆832Updated 5 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆504Updated this week
- Intel® NPU Acceleration Library☆703Updated 9 months ago
- OpenVINO Tokenizers extension☆48Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 2 months ago
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.☆510Updated 9 months ago
- Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Inte…☆728Updated this week
- Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.☆677Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆64Updated 7 months ago
- AMD related optimizations for transformer models☆97Updated 3 months ago
- ☆137Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆845Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆819Updated this week
- ☆413Updated 2 years ago
- GPTQ inference Triton kernel☆321Updated 2 years ago
- A pytorch quantization backend for optimum☆1,022Updated 2 months ago
- Intel® Extension for TensorFlow*☆349Updated 3 months ago