tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆20Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆315Updated 4 months ago
- Tenstorrent MLIR compiler☆169Updated this week
- TVM for Tenstorrent ASICs☆25Updated this week
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆53Updated this week
- Tenstorrent Firmware repository☆18Updated 2 weeks ago
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆48Updated this week
- Tenstorrent Kernel Module☆50Updated this week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 4 months ago
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆26Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆99Updated this week
- OpenAI Triton backend for Intel® GPUs☆198Updated this week
- Frontend integration for PyTorch with tt-mlir☆23Updated this week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,069Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆31Updated 4 months ago
- An experimental CPU backend for Triton☆139Updated 2 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆444Updated this week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆270Updated last month
- Tenstorrent console based hardware information program☆51Updated last week
- AI Tensor Engine for ROCm☆249Updated this week
- Repository of model demos using TT-Buda☆62Updated 4 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆94Updated this week
- ☆21Updated this week
- Tenstorrent Firmware Update Utility☆7Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆656Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆454Updated this week
- Development repository for the Triton language and compiler☆127Updated this week
- Shared Middle-Layer for Triton Compilation☆264Updated this week
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆19Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆207Updated 6 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆110Updated 2 months ago