tinygrad / gpuctypesLinks
ctypes wrappers for HIP, CUDA, and OpenCL
☆130Updated last year
Alternatives and similar repositories for gpuctypes
Users that are interested in gpuctypes are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆110Updated 9 months ago
- RDNA3 emulator☆54Updated 6 months ago
- Learning about CUDA by writing PTX code.☆147Updated last year
- Nvidia Instruction Set Specification Generator☆298Updated last year
- ☆78Updated this week
- Custom PTX Instruction Benchmark☆132Updated 8 months ago
- Sniff CUDA ioctls☆216Updated 2 years ago
- Quantized LLM training in pure CUDA/C++.☆215Updated this week
- LLM training in simple, raw C/CUDA☆108Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆212Updated 9 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆195Updated 2 years ago
- ☆93Updated last week
- Super fast FP32 matrix multiplication on RDNA3☆78Updated 7 months ago
- FP4 MAC Array☆19Updated last year
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆140Updated this week
- Tutorials on tinygrad☆438Updated last month
- Attention in SRAM on Tenstorrent Grayskull☆38Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆306Updated 2 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- tiny code to access tenstorrent blackhole☆60Updated 5 months ago
- Step by step implementation of a fast softmax kernel in CUDA☆54Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆150Updated 2 years ago
- An implementation of delta-iris in tinygrad☆72Updated last year
- ☆337Updated last week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆138Updated this week
- Fast and Furious AMD Kernels☆110Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated this week
- Tenstorrent MLIR compiler☆211Updated this week
- ☆89Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 2 months ago