moritztng / grayskull-attentionLinks
Attention in SRAM on Tenstorrent Grayskull
☆38Updated last year
Alternatives and similar repositories for grayskull-attention
Users that are interested in grayskull-attention are comparing it to the libraries listed below
Sorting:
- Tenstorrent MLIR compiler☆174Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆44Updated last week
- High-Performance SGEMM on CUDA devices☆97Updated 7 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆95Updated last month
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆102Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆74Updated this week
- MLIR-based partitioning system☆123Updated this week
- Buda Compiler Backend for Tenstorrent devices☆30Updated 4 months ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆64Updated 3 weeks ago
- Super fast FP32 matrix multiplication on RDNA3☆71Updated 4 months ago
- ☆86Updated 9 months ago
- Custom PTX Instruction Benchmark☆126Updated 5 months ago
- ☆42Updated 3 months ago
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels☆144Updated last week
- ☆111Updated 5 months ago
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆21Updated this week
- ☆33Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆52Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆94Updated this week
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆132Updated 7 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆207Updated 6 months ago
- An experimental CPU backend for Triton☆145Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆75Updated this week
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆49Updated this week
- ☆58Updated this week
- TVM for Tenstorrent ASICs☆25Updated last week
- extensible collectives library in triton☆88Updated 4 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆113Updated last year
- ☆41Updated this week