moritztng / grayskull-attentionLinks
Attention in SRAM on Tenstorrent Grayskull
☆37Updated last year
Alternatives and similar repositories for grayskull-attention
Users that are interested in grayskull-attention are comparing it to the libraries listed below
Sorting:
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- Tenstorrent MLIR compiler☆165Updated this week
- High-Performance SGEMM on CUDA devices☆98Updated 6 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆93Updated last month
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆96Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels☆138Updated this week
- An experimental CPU backend for Triton☆138Updated 2 months ago
- ☆20Updated 3 months ago
- ☆33Updated 2 weeks ago
- Custom PTX Instruction Benchmark☆126Updated 5 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆132Updated 7 months ago
- The Riallto Open Source Project from AMD☆82Updated 3 months ago
- Super fast FP32 matrix multiplication on RDNA3☆70Updated 4 months ago
- MLIR-based partitioning system☆115Updated this week
- ☆110Updated 4 months ago
- Buda Compiler Backend for Tenstorrent devices☆29Updated 4 months ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆60Updated this week
- ☆85Updated 8 months ago
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- ☆148Updated this week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆124Updated 2 weeks ago
- TVM for Tenstorrent ASICs☆24Updated last week
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆69Updated 2 weeks ago
- ☆27Updated last year
- Automatic differentiation for Triton Kernels☆11Updated last week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆113Updated last year
- ☆41Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆110Updated 2 months ago