tinygrad / gpuctypesLinks
ctypes wrappers for HIP, CUDA, and OpenCL
☆129Updated 11 months ago
Alternatives and similar repositories for gpuctypes
Users that are interested in gpuctypes are comparing it to the libraries listed below
Sorting:
- Nvidia Instruction Set Specification Generator☆267Updated 10 months ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- ☆30Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆59Updated this week
- RDNA3 emulator☆54Updated last month
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆183Updated last year
- High-Performance SGEMM on CUDA devices☆92Updated 4 months ago
- Solve puzzles to improve your tinygrad skills!☆128Updated 2 months ago
- An implementation of delta-iris in tinygrad☆72Updated 9 months ago
- Tutorials on tinygrad☆379Updated 3 weeks ago
- Enabling tinygrad compatibility with the Google Edge TPU☆77Updated 9 months ago
- Scripts and environment for the tinybox☆93Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago
- could we make an ml stack in 100,000 lines of code?☆42Updated 10 months ago
- Sniff CUDA ioctls☆192Updated 2 years ago
- Custom PTX Instruction Benchmark☆126Updated 3 months ago
- Tenstorrent MLIR compiler☆132Updated this week
- ☆84Updated last week
- Tensor library with autograd using only Rust's standard library☆68Updated 11 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆132Updated 6 months ago
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆133Updated last year
- ☆443Updated last month
- A really tiny autograd engine☆94Updated last week
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated last month
- LLM training in simple, raw C/Metal Shading Language☆54Updated last year
- ☆34Updated this week
- tenstorrent kernel from twitch☆27Updated last year
- FP4 MAC Array☆18Updated last year
- LLM training in simple, raw C/CUDA☆99Updated last year