kuterd / nv_isa_solverLinks
Nvidia Instruction Set Specification Generator
☆290Updated last year
Alternatives and similar repositories for nv_isa_solver
Users that are interested in nv_isa_solver are comparing it to the libraries listed below
Sorting:
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆136Updated last month
- GPUOcelot: A dynamic compilation framework for PTX☆207Updated 6 months ago
- Custom PTX Instruction Benchmark☆126Updated 6 months ago
- High-Performance SGEMM on CUDA devices☆97Updated 7 months ago
- Learning about CUDA by writing PTX code.☆134Updated last year
- ☆58Updated this week
- RDNA3 emulator☆54Updated 4 months ago
- Tenstorrent MLIR compiler☆174Updated last week
- Sniff CUDA ioctls☆204Updated 2 years ago
- Super fast FP32 matrix multiplication on RDNA3☆71Updated 4 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆132Updated 7 months ago
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆48Updated last week
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels☆144Updated last week
- ☆450Updated 4 months ago
- An experimental CPU backend for Triton☆145Updated 2 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆94Updated this week
- Visualization of cache-optimized matrix multiplication☆155Updated 5 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆195Updated 9 months ago
- ☆49Updated 7 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆102Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆74Updated last week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆105Updated last week
- GPU documentation for humans☆119Updated last week
- AI Tensor Engine for ROCm☆254Updated this week
- Apple GPU microarchitecture☆547Updated 11 months ago
- Attention in SRAM on Tenstorrent Grayskull☆38Updated last year
- ☆111Updated 5 months ago
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆299Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆112Updated 3 months ago