huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆430Updated this week
Alternatives and similar repositories for optimum-intel:
Users that are interested in optimum-intel are comparing it to the libraries listed below
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆165Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆280Updated last month
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆198Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated 4 months ago
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆349Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆304Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillati…☆667Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆257Updated 3 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆763Updated last month
- Common utilities for ONNX converters☆256Updated last month
- Advanced Quantization Algorithm for LLMs/VLMs.☆344Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆623Updated this week
- The Triton backend for the ONNX Runtime.☆136Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆680Updated 4 months ago
- Common source, scripts and utilities for creating Triton backends.☆305Updated this week
- A pytorch quantization backend for optimum☆865Updated last week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆446Updated this week
- Generative AI extensions for onnxruntime☆581Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆58Updated last month
- The Triton TensorRT-LLM Backend☆745Updated last week
- OpenVINO Tokenizers extension☆28Updated this week
- ☆411Updated last year
- Examples for using ONNX Runtime for model training.☆322Updated 2 months ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,298Updated this week
- Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.☆585Updated this week
- ☆215Updated this week
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.☆484Updated last week
- Pipeline Parallelism for PyTorch☆736Updated 4 months ago
- LLaMa/RWKV onnx models, quantization and testcase☆356Updated last year