tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆26Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆313Updated 7 months ago
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆37Updated last week
- Tenstorrent MLIR compiler☆213Updated last week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 7 months ago
- Tenstorrent Kernel Module☆57Updated this week
- TVM for Tenstorrent ASICs☆27Updated 2 months ago
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆60Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆51Updated this week
- AI Tensor Engine for ROCm☆306Updated this week
- OpenAI Triton backend for Intel® GPUs☆221Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆35Updated 3 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆141Updated this week
- An experimental CPU backend for Triton☆164Updated 2 weeks ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,265Updated this week
- ROCm Communication Collectives Library (RCCL)☆400Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆75Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆116Updated last week
- MLIR-based partitioning system☆150Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆101Updated this week
- Repository of model demos using TT-Buda☆63Updated 7 months ago
- ☆51Updated this week
- Perplexity GPU Kernels☆531Updated 3 weeks ago
- Shared Middle-Layer for Triton Compilation☆313Updated last month
- Frontend integration for PyTorch with tt-mlir☆24Updated 2 weeks ago
- Development repository for the Triton language and compiler☆137Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆362Updated last week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆713Updated 3 months ago
- ☆27Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆398Updated last week
- Perplexity open source garden for inference technology☆274Updated last week