tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆20Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆315Updated 5 months ago
- Tenstorrent MLIR compiler☆174Updated this week
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆32Updated last week
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆54Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆49Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆102Updated this week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,097Updated this week
- Tenstorrent Kernel Module☆51Updated last week
- Tenstorrent Firmware repository☆19Updated this week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 5 months ago
- TVM for Tenstorrent ASICs☆26Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆467Updated this week
- Tenstorrent console based hardware information program☆52Updated last week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆288Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆205Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆665Updated 3 weeks ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆95Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆32Updated 5 months ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆529Updated last week
- Repository of model demos using TT-Buda☆62Updated 5 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆351Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆452Updated this week
- Development repository for the Triton language and compiler☆127Updated this week
- Stores documents and resources used by the OpenXLA developer community☆128Updated last year
- Shared Middle-Layer for Triton Compilation☆275Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆63Updated 2 months ago
- An experimental CPU backend for Triton☆146Updated 3 months ago
- ☆229Updated last year
- AI Tensor Engine for ROCm☆260Updated this week
- ☆64Updated last week