tenstorrent / tt-inference-serverLinks

☆34

Alternatives and similar repositories for tt-inference-server

Users that are interested in tt-inference-server are comparing it to the libraries listed below

Sorting:

tenstorrent / tt-forge-fe
The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…
☆51Updated this week
tenstorrent / tt-buda
Tenstorrent TT-BUDA Repository
☆313Updated 7 months ago
tenstorrent / tt-mlir
Tenstorrent MLIR compiler
☆213Updated last week
tenstorrent / tt-forge
Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…
☆141Updated this week
tenstorrent / tt-studio
TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…
☆37Updated last week
moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆39Updated last year
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆164Updated 2 weeks ago
ROCm / aiter
AI Tensor Engine for ROCm
☆306Updated this week
tenstorrent / tt-tvm
TVM for Tenstorrent ASICs
☆27Updated 2 months ago
tenstorrent / tt-kmd
Tenstorrent Kernel Module
☆57Updated this week
RadeonFlow / RadeonFlow_Kernels
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
☆72Updated last week
tenstorrent / tt-budabackend
Buda Compiler Backend for Tenstorrent devices
☆30Updated 7 months ago
mk1-project / quickreduce
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆35Updated 3 months ago
nod-ai / amd-shark-ai
AMD-SHARK Inference Modeling and Serving
☆56Updated this week
tenstorrent / tt-metal
TT-NN operator library, and TT-Metalium low level kernel programming model.
☆1,265Updated this week
tenstorrent / tt-xla
Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.
☆44Updated this week
tenstorrent / tensix-isa-simulator
☆27Updated 8 months ago
iree-org / iree-turbine
IREE's PyTorch Frontend, based on Torch Dynamo.
☆101Updated this week
0xD0GF00D / DocumentSASS
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆161Updated 4 months ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆221Updated this week
gpuocelot / gpuocelot
GPUOcelot: A dynamic compilation framework for PTX
☆216Updated 9 months ago
tenstorrent / tt-smi
Tenstorrent console based hardware information program
☆57Updated this week
tenstorrent / ttnn-visualizer
A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer ove…
☆40Updated last week
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)
☆676Updated last week
bertmaher / simplegemm
☆126Updated last month
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆160Updated 2 weeks ago
ROCm / iris
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆116Updated last week
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆313Updated last month
tenstorrent / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆26Updated this week
Xilinx / mlir-air
☆118Updated last week