tinygrad / gpuctypes
ctypes wrappers for HIP, CUDA, and OpenCL
☆129Updated 9 months ago
Alternatives and similar repositories for gpuctypes:
Users that are interested in gpuctypes are comparing it to the libraries listed below
- Nvidia Instruction Set Specification Generator☆255Updated 9 months ago
- Learning about CUDA by writing PTX code.☆128Updated last year
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆39Updated this week
- RDNA3 emulator☆54Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 2 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆185Updated 2 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆179Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆131Updated last year
- ☆82Updated this week
- Tutorials on tinygrad☆369Updated 3 weeks ago
- Custom PTX Instruction Benchmark☆123Updated last month
- Generate python ctypes classes from C headers. Requires LLVM clang☆13Updated 8 months ago
- An implementation of delta-iris in tinygrad☆72Updated 8 months ago
- Tenstorrent MLIR compiler☆119Updated this week
- Sniff CUDA ioctls☆192Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆257Updated 2 weeks ago
- Python bindings for ggml☆140Updated 7 months ago
- Solve puzzles to improve your tinygrad skills!☆122Updated last month
- Visualization of cache-optimized matrix multiplication☆116Updated last month
- Enabling tinygrad compatibility with the Google Edge TPU☆77Updated 7 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆40Updated this week
- pytorch from scratch in pure C/CUDA and python☆40Updated 6 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated last week
- ☆130Updated 5 months ago
- parallelized hyperdimensional tictactoe☆118Updated 7 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆65Updated 3 weeks ago
- MLIR-based partitioning system☆80Updated this week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆89Updated this week
- Super fast FP32 matrix multiplication on RDNA3☆46Updated 3 weeks ago