tenstorrent / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆22Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆316Updated 6 months ago
- Tenstorrent MLIR compiler☆187Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆122Updated this week
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆36Updated last week
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆57Updated last week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆50Updated this week
- TVM for Tenstorrent ASICs☆26Updated last month
- Buda Compiler Backend for Tenstorrent devices☆30Updated 6 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated last month
- An experimental CPU backend for Triton☆153Updated last week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,230Updated this week
- Tenstorrent Kernel Module☆54Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆552Updated this week
- AI Tensor Engine for ROCm☆285Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆691Updated 2 months ago
- Tenstorrent console based hardware information program☆53Updated last week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆303Updated 3 months ago
- Shared Middle-Layer for Triton Compilation☆289Updated last week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆97Updated this week
- MLIR-based partitioning system☆137Updated this week
- Perplexity GPU Kernels☆488Updated 3 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆601Updated last week
- OpenAI Triton backend for Intel® GPUs☆211Updated this week
- Tenstorrent Firmware repository☆23Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆83Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆472Updated this week
- Repository of model demos using TT-Buda☆62Updated 6 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆67Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆379Updated 2 weeks ago
- ☆124Updated 9 months ago