onnx / turnkeymlLinks
No-code CLI designed for accelerating ONNX workflows
☆196Updated 2 weeks ago
Alternatives and similar repositories for turnkeyml
Users that are interested in turnkeyml are comparing it to the libraries listed below
Sorting:
- AI Tensor Engine for ROCm☆208Updated this week
- Local LLM Server with GPU and NPU Acceleration☆138Updated this week
- Lightweight Inference server for OpenVINO☆187Updated last week
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆66Updated this week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆293Updated this week
- Run LLM Agents on Ryzen AI PCs in Minutes☆421Updated last week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆177Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆106Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆473Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 8 months ago
- OpenAI Triton backend for Intel® GPUs☆191Updated this week
- ☆108Updated last week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆525Updated this week
- AMD related optimizations for transformer models☆79Updated 7 months ago
- ☆541Updated last month
- Development repository for the Triton language and compiler☆125Updated this week
- Use safetensors with ONNX 🤗☆63Updated 3 months ago
- ☆158Updated last week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆629Updated last month
- Fast and memory-efficient exact attention☆174Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Generative AI extensions for onnxruntime☆740Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆129Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- High-speed and easy-use LLM serving framework for local deployment☆112Updated 3 months ago
- A collection of examples for the ROCm software stack☆224Updated this week
- Intel® NPU Acceleration Library☆680Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆84Updated this week