tenstorrent / tt-inference-serverLinks
☆40Updated this week
Alternatives and similar repositories for tt-inference-server
Users that are interested in tt-inference-server are comparing it to the libraries listed below
Sorting:
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆53Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆148Updated this week
- Tenstorrent MLIR compiler☆218Updated last week
- Tenstorrent TT-BUDA Repository☆314Updated 8 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 3 months ago
- Attention in SRAM on Tenstorrent Grayskull☆39Updated last year
- An experimental CPU backend for Triton☆167Updated last month
- Helpful kernel tutorials and examples for tile-based GPU programming☆456Updated this week
- A fast full-system simulator of Tenstorrent hardware☆34Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆133Updated this week
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆47Updated this week
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆39Updated last week
- Tenstorrent Kernel Module☆56Updated 2 weeks ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆138Updated 11 months ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆182Updated 5 months ago
- AMD-SHARK Inference Modeling and Serving☆59Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆427Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆102Updated this week
- ☆27Updated 9 months ago
- Buda Compiler Backend for Tenstorrent devices☆30Updated 8 months ago
- kernels, of the mega variety☆631Updated 2 months ago
- ☆120Updated last week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆133Updated this week
- AI Tensor Engine for ROCm☆322Updated this week
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆73Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆26Updated this week
- IREE plugin repository for the AMD AIE accelerator☆115Updated this week
- OpenAI Triton backend for Intel® GPUs☆222Updated this week
- CUDA Matrix Multiplication Optimization☆245Updated last year
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆327Updated 6 months ago