tenstorrent / tt-studioLinks
TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applications.
☆39Updated this week
Alternatives and similar repositories for tt-studio
Users that are interested in tt-studio are comparing it to the libraries listed below
Sorting:
- Tenstorrent console based hardware information program☆58Updated this week
- Repository of model demos using TT-Buda☆63Updated 9 months ago
- ☆43Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆53Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆166Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- Tenstorrent MLIR compiler☆231Updated this week
- ☆83Updated last month
- ☆15Updated 2 months ago
- [Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆61Updated last week
- Fast and Furious AMD Kernels☆331Updated 2 weeks ago
- Tenstorrent TT-BUDA Repository☆314Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆26Updated this week
- ☆27Updated 9 months ago
- ☆87Updated last week
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- ☆128Updated 2 months ago
- AI Tensor Engine for ROCm☆334Updated this week
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆50Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆91Updated this week
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Updated 7 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆188Updated 3 weeks ago
- Super fast FP32 matrix multiplication on RDNA3☆82Updated 9 months ago
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 4 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆148Updated this week
- Repository for AI model benchmarking on TT-Buda☆15Updated 10 months ago
- Nvidia Instruction Set Specification Generator☆309Updated last year
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆763Updated 3 weeks ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 4 months ago