geohot / tt-twitch
tenstorrent kernel from twitch
☆27Updated last year
Alternatives and similar repositories for tt-twitch
Users that are interested in tt-twitch are comparing it to the libraries listed below
Sorting:
- RDNA3 emulator☆54Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- ☆28Updated last month
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 9 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆47Updated this week
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- ☆51Updated 9 months ago
- FP4 MAC Array☆17Updated last year
- Embedded Universal DSL: a good DSL for us, by us☆36Updated this week
- ☆13Updated 2 months ago
- ☆14Updated 5 months ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆16Updated this week
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- Learning about CUDA by writing PTX code.☆129Updated last year
- Tenstorrent MLIR compiler☆122Updated this week
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 10 months ago
- Super fast FP32 matrix multiplication on RDNA3☆51Updated last month
- Personal solutions to the Triton Puzzles☆18Updated 9 months ago
- Tensor library with autograd using only Rust's standard library☆67Updated 10 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated last week
- AMD’s C++ library for accelerating tensor primitives☆40Updated this week
- Nvidia Instruction Set Specification Generator☆260Updated 10 months ago
- LLM training in simple, raw C/CUDA☆94Updated last year
- The Finite Field Assembly Programming Language☆36Updated last month
- ☆16Updated 7 months ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- ☆21Updated 2 months ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆91Updated last week
- ☆12Updated last week