onnx / turnkeyml
LLM SDK for OnnxRuntime GenAI (OGA)
☆112Updated this week
Alternatives and similar repositories for turnkeyml:
Users that are interested in turnkeyml are comparing it to the libraries listed below
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆325Updated this week
- Use safetensors with ONNX 🤗☆51Updated 2 weeks ago
- Model compression for ONNX☆87Updated 4 months ago
- This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …☆39Updated this week
- OpenVINO Tokenizers extension☆31Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 5 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆245Updated this week
- AMD's graph optimization engine.☆213Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Common utilities for ONNX converters☆259Updated 3 months ago
- AMD related optimizations for transformer models☆70Updated 4 months ago
- OpenAI Triton backend for Intel® GPUs☆169Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆81Updated this week
- ☆116Updated 11 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 weeks ago
- ☆157Updated this week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆262Updated 11 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆394Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆142Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆365Updated this week
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆177Updated 11 months ago
- Fast low-bit matmul kernels in Triton☆267Updated this week
- Development repository for the Triton language and compiler☆114Updated this week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆212Updated 6 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆240Updated 4 months ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆451Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆163Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆217Updated this week