microsoft / onnxruntime-genaiLinks
Generative AI extensions for onnxruntime
☆911Updated this week
Alternatives and similar repositories for onnxruntime-genai
Users that are interested in onnxruntime-genai are comparing it to the libraries listed below
Sorting:
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.☆2,211Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆431Updated last week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆522Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆414Updated this week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆403Updated this week
- LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via e…☆1,163Updated this week
- Examples for using ONNX Runtime for model training.☆358Updated last year
- Intel® NPU Acceleration Library☆703Updated 8 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆868Updated last week
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Tra…☆775Updated this week
- Low-bit LLM inference on CPU/NPU with lookup table☆903Updated 6 months ago
- ☆1,027Updated last year
- No-code CLI designed for accelerating ONNX workflows☆222Updated 6 months ago
- Common utilities for ONNX converters☆289Updated last week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆880Updated last week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,169Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆902Updated last week
- The Triton TensorRT-LLM Backend☆909Updated last week
- Universal cross-platform tokenizers binding to HF and sentencepiece☆437Updated 4 months ago
- Examples for using ONNX Runtime for machine learning inferencing.☆1,571Updated last week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,997Updated last week
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆943Updated this week
- Use safetensors with ONNX 🤗☆78Updated 2 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆758Updated this week
- A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresse…☆1,720Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆670Updated 8 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆218Updated last year
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,447Updated last week
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆350Updated last week