kuterd / nv_isa_solverLinks
Nvidia Instruction Set Specification Generator
☆298Updated last year
Alternatives and similar repositories for nv_isa_solver
Users that are interested in nv_isa_solver are comparing it to the libraries listed below
Sorting:
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆160Updated 3 months ago
- Super fast FP32 matrix multiplication on RDNA3☆78Updated 7 months ago
- Learning about CUDA by writing PTX code.☆147Updated last year
- Tenstorrent MLIR compiler☆211Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆212Updated 9 months ago
- High-Performance SGEMM on CUDA devices☆110Updated 9 months ago
- Custom PTX Instruction Benchmark☆132Updated 8 months ago
- ☆78Updated this week
- Apple GPU microarchitecture☆559Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆209Updated last year
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆50Updated 2 months ago
- RDNA3 emulator☆54Updated 6 months ago
- Visualization of cache-optimized matrix multiplication☆155Updated 8 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆104Updated this week
- An experimental CPU backend for Triton☆160Updated last week
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆120Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆553Updated 2 years ago
- Sniff CUDA ioctls☆216Updated 2 years ago
- Attention in SRAM on Tenstorrent Grayskull☆38Updated last year
- ☆448Updated 7 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆101Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆140Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆396Updated last week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆113Updated last week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆171Updated this week
- ☆123Updated 3 weeks ago
- Fast and Furious AMD Kernels☆110Updated this week
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Updated 5 months ago
- MLIR-based partitioning system☆148Updated this week