tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆26Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆314Updated 9 months ago
- Tenstorrent MLIR compiler☆231Updated this week
- Tenstorrent Kernel Module☆57Updated last week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆53Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆162Updated this week
- [Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆61Updated last week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 9 months ago
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆39Updated 3 weeks ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,303Updated this week
- OpenAI Triton backend for Intel® GPUs☆223Updated this week
- TVM for Tenstorrent ASICs☆28Updated 4 months ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆331Updated 6 months ago
- An experimental CPU backend for Triton☆168Updated 2 months ago
- Tenstorrent console based hardware information program☆58Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆143Updated last week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆741Updated 5 months ago
- Tenstorrent Firmware repository☆23Updated 3 weeks ago
- A fast full-system simulator of Tenstorrent hardware☆38Updated 3 weeks ago
- Tenstorrent Firmware Update Utility☆10Updated last week
- Backward compatible ML compute opset inspired by HLO/MHLO☆589Updated 3 weeks ago
- MLIR-based partitioning system☆157Updated this week
- Development repository for the Triton language and compiler☆140Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆103Updated 3 weeks ago
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆556Updated this week
- ☆43Updated this week
- Repository of model demos using TT-Buda☆63Updated 9 months ago
- Stores documents and resources used by the OpenXLA developer community☆131Updated last year
- Frontend integration for PyTorch with tt-mlir☆23Updated last month
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 4 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 11 months ago