microsoft / onnxruntime-genaiLinks

Generative AI extensions for onnxruntime

☆783

Alternatives and similar repositories for onnxruntime-genai

Users that are interested in onnxruntime-genai are comparing it to the libraries listed below

Sorting:

openvinotoolkit / openvino.genai
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
☆316Updated this week
microsoft / Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
☆2,036Updated this week
microsoft / onnxruntime-extensions
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
☆404Updated last week
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆481Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated 11 months ago
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆369Updated this week
quic / ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…
☆756Updated this week
microsoft / onnxruntime-training-examples
Examples for using ONNX Runtime for model training.
☆339Updated 9 months ago
intel / intel-npu-acceleration-library
Intel® NPU Acceleration Library
☆680Updated 3 months ago
microsoft / Llama-2-Onnx
☆1,028Updated last year
mlc-ai / tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
☆367Updated last week
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆836Updated 2 months ago
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…
☆564Updated this week
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆855Updated last week
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,169Updated 9 months ago
google-ai-edge / LiteRT
LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Now with LiteRT Next, we're exp…
☆688Updated this week
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆276Updated 2 weeks ago
NVIDIA / TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …
☆1,093Updated this week
ModelCloud / GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…
☆713Updated this week
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆362Updated 2 years ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆872Updated this week
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆69Updated last month
google-ai-edge / ai-edge-torch
Supporting PyTorch models with the Google AI Edge TFLite runtime.
☆732Updated this week
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆205Updated last month
intel / intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
☆1,921Updated last week
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆1,726Updated this week
NVIDIA / cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
☆598Updated 3 weeks ago
microsoft / onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
☆1,441Updated last week
mit-han-lab / TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
☆879Updated last year
openvinotoolkit / openvino_tokenizers
OpenVINO Tokenizers extension
☆38Updated this week