tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆22Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆316Updated 5 months ago
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆56Updated this week
- Tenstorrent Firmware repository☆21Updated this week
- Tenstorrent MLIR compiler☆185Updated this week
- Tenstorrent Kernel Module☆54Updated this week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,210Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆117Updated this week
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆37Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆51Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated 3 weeks ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆298Updated 3 months ago
- Buda Compiler Backend for Tenstorrent devices☆30Updated 5 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆98Updated this week
- TVM for Tenstorrent ASICs☆26Updated 2 weeks ago
- An experimental CPU backend for Triton☆153Updated 3 months ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆676Updated last month
- Repository of model demos using TT-Buda☆62Updated 5 months ago
- OpenAI Triton backend for Intel® GPUs☆208Updated this week
- Development repository for the Triton language and compiler☆131Updated this week
- Tenstorrent console based hardware information program☆53Updated last week
- Shared Middle-Layer for Triton Compilation☆286Updated 3 weeks ago
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆481Updated this week
- Perplexity GPU Kernels☆469Updated this week
- AI Tensor Engine for ROCm☆279Updated this week
- CUDA GPU Benchmark☆30Updated 7 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆66Updated 3 months ago
- ☆119Updated 8 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆573Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆102Updated this week
- Fast low-bit matmul kernels in Triton☆371Updated last week