onnx / turnkeyml
Local LLM Server with NPU Acceleration
☆176Updated this week
Alternatives and similar repositories for turnkeyml:
Users that are interested in turnkeyml are comparing it to the libraries listed below
- AI Tensor Engine for ROCm☆187Updated this week
- Lightweight Inference server for OpenVINO☆163Updated last week
- Run LLM Agents on Ryzen AI PCs in Minutes☆347Updated last week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆57Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated 8 months ago
- AMD related optimizations for transformer models☆75Updated 6 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆269Updated this week
- ☆106Updated last month
- Use safetensors with ONNX 🤗☆56Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆462Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆93Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆400Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated last week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆449Updated last week
- Model compression for ONNX☆92Updated 5 months ago
- ☆156Updated last month
- Development repository for the Triton language and compiler☆118Updated this week
- Generative AI extensions for onnxruntime☆703Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆393Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆601Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- A collection of examples for the ROCm software stack☆206Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆348Updated this week
- ☆510Updated last week
- An experimental CPU backend for Triton☆110Updated last week
- AMD's graph optimization engine.☆216Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆168Updated last month
- OpenVINO Intel NPU Compiler☆49Updated this week