tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆24Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆315Updated 7 months ago
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆51Updated this week
- Tenstorrent Kernel Module☆55Updated this week
- Tenstorrent MLIR compiler☆206Updated this week
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆60Updated this week
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆39Updated last week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆135Updated this week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 7 months ago
- TVM for Tenstorrent ASICs☆27Updated last month
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,250Updated this week
- Tenstorrent console based hardware information program☆54Updated this week
- Repository of model demos using TT-Buda☆63Updated 7 months ago
- Tenstorrent Firmware Update Utility☆11Updated 2 weeks ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆99Updated this week
- OpenAI Triton backend for Intel® GPUs☆219Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆70Updated last month
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆35Updated 2 months ago
- Tenstorrent Firmware repository☆24Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆515Updated this week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆310Updated 4 months ago
- AI Tensor Engine for ROCm☆296Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆705Updated 3 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆101Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆481Updated this week
- Perplexity GPU Kernels☆519Updated last week
- An experimental CPU backend for Triton☆155Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆359Updated this week
- MLIR-based partitioning system☆144Updated this week
- Development repository for the Triton language and compiler☆136Updated last week
- ☆51Updated last week