tenstorrent / tt-metalLinks
TT-NN operator library, and TT-Metalium low level kernel programming model.
☆942Updated this week
Alternatives and similar repositories for tt-metal
Users that are interested in tt-metal are comparing it to the libraries listed below
Sorting:
- Tenstorrent TT-BUDA Repository☆313Updated 2 months ago
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆47Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆72Updated this week
- Tenstorrent MLIR compiler☆141Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆44Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆431Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆427Updated this week
- torchtrail: trace the graph of torch functions and modules for visualization, reports, etc☆25Updated last month
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,565Updated this week
- Exocompilation for productive programming of hardware accelerators☆607Updated this week
- Awesome resources for GPUs☆572Updated last year
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,314Updated this week
- Frontend integration for PyTorch with tt-mlir☆21Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆878Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Updated last week
- Backward compatible ML compute opset inspired by HLO/MHLO☆494Updated last week
- CUDA Kernel Benchmarking Library☆670Updated this week
- Fast CUDA matrix multiplication from scratch☆751Updated last year
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆18Updated this week
- Nvidia Instruction Set Specification Generator☆278Updated 11 months ago
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,322Updated this week
- AI Tensor Engine for ROCm☆208Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆637Updated last month
- Repository of model demos using TT-Buda☆62Updated 2 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆508Updated 2 years ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆339Updated this week
- Tile primitives for speedy kernels☆2,478Updated this week
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆668Updated 7 years ago
- Shared Middle-Layer for Triton Compilation☆256Updated this week
- A retargetable MLIR-based machine learning compiler and runtime toolkit.☆3,197Updated this week