tenstorrent / tt-metal
TT-NN operator library, and TT-Metalium low level kernel programming model.
☆627Updated this week
Alternatives and similar repositories for tt-metal:
Users that are interested in tt-metal are comparing it to the libraries listed below
- Tenstorrent TT-BUDA Repository☆290Updated 2 months ago
- Tenstorrent MLIR compiler☆91Updated this week
- ⭐️ TTNN Compiler for PyTorch 2.0 ⭐️ It enables running PyTorch2.0 models on Tenstorrent hardware☆30Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆335Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆350Updated this week
- torchtrail: trace the graph of torch functions and modules for visualization, reports, etc☆25Updated 8 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆169Updated last week
- Repository of model demos using TT-Buda☆62Updated 2 months ago
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆28Updated this week
- Fast CUDA matrix multiplication from scratch☆634Updated last year
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆426Updated last year
- Backward compatible ML compute opset inspired by HLO/MHLO☆446Updated last week
- A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer ove…☆25Updated this week
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,424Updated this week
- This is the top-level repository for the Accel-Sim framework.☆345Updated this week
- ☆137Updated this week
- collection of benchmarks to measure basic GPU capabilities☆296Updated last week
- Awesome resources for GPUs☆546Updated last year
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆79Updated this week
- Tenstorrent Kernel Module☆37Updated 3 weeks ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆523Updated last week
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆405Updated 7 years ago
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆927Updated 3 months ago
- An experimental CPU backend for Triton☆90Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆812Updated this week
- Shared Middle-Layer for Triton Compilation☆226Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆212Updated 3 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆502Updated 3 weeks ago