moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆35Updated 9 months ago
Alternatives and similar repositories for grayskull-attention:
Users that are interested in grayskull-attention are comparing it to the libraries listed below
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- Tenstorrent MLIR compiler☆122Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- Reference Kernels for the Leaderboard☆42Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆46Updated this week
- Explore training for quantized models☆18Updated 4 months ago
- ☆102Updated last month
- The Riallto Open Source Project from AMD☆77Updated 3 weeks ago
- ☆13Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆106Updated 6 months ago
- Buda Compiler Backend for Tenstorrent devices☆28Updated last month
- LLM training in simple, raw C/CUDA☆94Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated last month
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- Custom PTX Instruction Benchmark☆123Updated 2 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆14Updated 3 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆126Updated 4 months ago
- An experimental CPU backend for Triton☆110Updated last week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆169Updated last month
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆102Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆37Updated last week
- ☆16Updated 7 months ago
- ☆78Updated 6 months ago
- ☆15Updated last year
- ☆202Updated 2 weeks ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆38Updated 9 months ago
- A lightweight, Pythonic, frontend for MLIR☆81Updated last year
- ☆26Updated last year