tenstorrent / tt-metal
TT-NN operator library, and TT-Metalium low level kernel programming model.
☆681Updated this week
Alternatives and similar repositories for tt-metal:
Users that are interested in tt-metal are comparing it to the libraries listed below
- Tenstorrent TT-BUDA Repository☆307Updated 2 weeks ago
- Tenstorrent MLIR compiler☆109Updated this week
- ⭐️ TTNN Compiler for PyTorch 2.0 ⭐️ It enables running PyTorch2.0 models on Tenstorrent hardware☆33Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆373Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆354Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆457Updated last week
- torchtrail: trace the graph of torch functions and modules for visualization, reports, etc☆25Updated 9 months ago
- A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer ove…☆27Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆465Updated last year
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆30Updated this week
- Awesome resources for GPUs☆554Updated last year
- This is the top-level repository for the Accel-Sim framework.☆378Updated 2 weeks ago
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,481Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆182Updated last month
- AI Tensor Engine for ROCm☆142Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆74Updated this week
- Repository of model demos using TT-Buda☆63Updated this week
- An open-source efficient deep learning framework/compiler, written in python.☆692Updated last month
- Nvidia Instruction Set Specification Generator☆253Updated 8 months ago
- Exocompilation for productive programming of hardware accelerators☆584Updated 2 weeks ago
- Buda Compiler Backend for Tenstorrent devices☆28Updated last month
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆834Updated last week
- HIPIFY: Convert CUDA to Portable C++ Code☆567Updated last week
- OpenAI Triton backend for Intel® GPUs☆172Updated this week
- Berkeley's Spatial Array Generator☆911Updated last month
- An experimental CPU backend for Triton☆103Updated this week
- CUDA Kernel Benchmarking Library☆600Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆234Updated 2 weeks ago
- IREE plugin repository for the AMD AIE accelerator☆87Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆572Updated last month