huggingface / optimum-intelLinks

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

☆501

Alternatives and similar repositories for optimum-intel

Users that are interested in optimum-intel are comparing it to the libraries listed below

Sorting:

huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆199Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
openvinotoolkit / openvino.genai
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
☆364Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆318Updated last month
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆404Updated this week
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆162Updated 2 weeks ago
intel / intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
☆1,977Updated last week
microsoft / onnxruntime-training-examples
Examples for using ONNX Runtime for model training.
☆351Updated last year
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…
☆2,511Updated this week
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆494Updated 2 weeks ago
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆281Updated last month
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆787Updated 2 weeks ago
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
☆668Updated this week
openvinotoolkit / openvino_tokenizers
OpenVINO Tokenizers extension
☆42Updated last week
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆901Updated last week
NVIDIA / cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
☆627Updated 2 weeks ago
microsoft / onnxruntime-extensions
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
☆418Updated this week
HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆165Updated last month
intel / ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Inte…
☆719Updated 3 weeks ago
triton-inference-server / fastertransformer_backend
☆413Updated last year
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆352Updated 2 weeks ago
intel / intel-npu-acceleration-library
Intel® NPU Acceleration Library
☆691Updated 6 months ago
intel / intel-extension-for-tensorflow
Intel® Extension for TensorFlow*
☆346Updated 7 months ago
openvinotoolkit / nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
☆1,091Updated this week
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆823Updated 2 months ago
microsoft / onnxruntime-genai
Generative AI extensions for onnxruntime
☆861Updated this week
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆916Updated last year
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆995Updated last week
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 3 months ago