kuterd / nv_isa_solver
Nvidia Instruction Set Specification Generator
☆253Updated 8 months ago
Alternatives and similar repositories for nv_isa_solver:
Users that are interested in nv_isa_solver are comparing it to the libraries listed below
- Learning about CUDA by writing PTX code.☆124Updated last year
- High-Performance SGEMM on CUDA devices☆86Updated 2 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆181Updated last month
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆73Updated 2 weeks ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 8 months ago
- RDNA3 emulator☆52Updated last week
- An experimental CPU backend for Triton☆100Updated last week
- Fastest kernels written from scratch☆199Updated 2 weeks ago
- Tenstorrent MLIR compiler☆105Updated this week
- Visualization of cache-optimized matrix multiplication☆105Updated last week
- Exploring the scalable matrix extension of the Apple M4 processor☆168Updated 4 months ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆84Updated this week
- parallelized hyperdimensional tictactoe☆114Updated 6 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆462Updated last year
- ☆437Updated last week
- Apple GPU microarchitecture☆504Updated 6 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆234Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆32Updated 8 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆129Updated last week
- ☆138Updated this week
- ☆72Updated this week
- Sniff CUDA ioctls☆190Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 5 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆341Updated last month
- IREE's PyTorch Frontend, based on Torch Dynamo.☆72Updated this week