Qazalin / remu
RDNA3 emulator
☆54Updated this week
Alternatives and similar repositories for remu:
Users that are interested in remu are comparing it to the libraries listed below
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 9 months ago
- tenstorrent kernel from twitch☆27Updated last year
- Nvidia Instruction Set Specification Generator☆255Updated 9 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆39Updated this week
- Tenstorrent MLIR compiler☆119Updated this week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆89Updated this week
- FP4 MAC Array☆17Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆185Updated 2 months ago
- ☆54Updated 10 months ago
- Super fast FP32 matrix multiplication on RDNA3☆46Updated 3 weeks ago
- Buda Compiler Backend for Tenstorrent devices☆28Updated 2 weeks ago
- Custom PTX Instruction Benchmark☆123Updated last month
- ☆27Updated last month
- ⭐️ TTNN Compiler for PyTorch 2.0 ⭐️ It enables running PyTorch2.0 models on Tenstorrent hardware☆34Updated this week
- The Finite Field Assembly Programming Language☆36Updated last week
- Machine learning for machine code.☆88Updated last week
- Sniff CUDA ioctls☆192Updated last year
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆13Updated this week
- Can RL solve simple problems?☆54Updated last year
- A GLSL compiler targeting SPIR-V mlir☆19Updated 6 months ago
- High-Performance SGEMM on CUDA devices☆90Updated 2 months ago
- Tenstorrent system interface library☆16Updated this week
- Virtual machine for executing CUDA PTX without a GPU☆33Updated last year
- Website for CS 265☆28Updated 3 months ago
- MLIR-based partitioning system☆80Updated this week
- parallelized hyperdimensional tictactoe☆118Updated 7 months ago
- Generate python ctypes classes from C headers. Requires LLVM clang☆13Updated 8 months ago
- Tenstorrent Kernel Module☆41Updated this week
- Tensor library with autograd using only Rust's standard library☆67Updated 9 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆171Updated 5 months ago