Qazalin / remu
RDNA3 emulator
☆54Updated 3 weeks ago
Alternatives and similar repositories for remu:
Users that are interested in remu are comparing it to the libraries listed below
- tenstorrent kernel from twitch☆27Updated last year
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 10 months ago
- Nvidia Instruction Set Specification Generator☆260Updated 10 months ago
- Tenstorrent MLIR compiler☆122Updated this week
- ☆28Updated last month
- FP4 MAC Array☆17Updated last year
- Custom PTX Instruction Benchmark☆123Updated 2 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆47Updated this week
- Tenstorrent system interface library☆16Updated this week
- The Finite Field Assembly Programming Language☆36Updated 3 weeks ago
- ☆58Updated 10 months ago
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆42Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- A GLSL compiler targeting SPIR-V mlir☆20Updated 6 months ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆154Updated last year
- Enabling tinygrad compatibility with the Google Edge TPU☆77Updated 8 months ago
- Tensor library with autograd using only Rust's standard library☆67Updated 10 months ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆91Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆187Updated 3 months ago
- ☆54Updated 10 months ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆94Updated this week
- Frontend integration for PyTorch with tt-mlir☆14Updated this week
- ☆444Updated last month
- Run 64-bit Linux on LiteX + RocketChip☆196Updated 9 months ago
- Repo for AI Compiler team. The intended purpose of this repo is for implementation of a PJRT device.☆16Updated this week
- GPU documentation for humans☆46Updated 2 weeks ago
- Super fast FP32 matrix multiplication on RDNA3☆51Updated last month
- IREE's PyTorch Frontend, based on Torch Dynamo.☆82Updated this week
- Sniff CUDA ioctls☆192Updated 2 years ago
- ☆33Updated this week