tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆27Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆314Updated 9 months ago
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆39Updated last week
- Tenstorrent MLIR compiler☆243Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆178Updated last week
- TVM for Tenstorrent ASICs☆28Updated 4 months ago
- Tenstorrent Kernel Module☆57Updated last week
- The TT-Forge ONNX is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their p…☆53Updated last week
- OpenAI Triton backend for Intel® GPUs☆225Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆164Updated last week
- ☆44Updated last week
- AI Tensor Engine for ROCm☆344Updated last week
- Frontend integration for PyTorch with tt-mlir☆23Updated last month
- Buda Compiler Backend for Tenstorrent devices☆30Updated 9 months ago
- Repository of model demos using TT-Buda☆63Updated 9 months ago
- An experimental CPU backend for Triton☆173Updated 2 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 5 months ago
- [Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆61Updated last month
- Development repository for the Triton language and compiler☆140Updated this week
- ☆59Updated this week
- Tenstorrent console based hardware information program☆58Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆105Updated last week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,319Updated last week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆337Updated 7 months ago
- MLIR-based partitioning system☆162Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆94Updated 3 weeks ago
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆51Updated this week
- Perplexity GPU Kernels☆554Updated 2 months ago
- Shared Middle-Layer for Triton Compilation☆324Updated last month
- Github mirror of trition-lang/triton repo.☆126Updated last week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆125Updated 2 months ago