tinygrad / gpuctypes
ctypes wrappers for HIP, CUDA, and OpenCL
☆129Updated 8 months ago
Alternatives and similar repositories for gpuctypes:
Users that are interested in gpuctypes are comparing it to the libraries listed below
- Nvidia Instruction Set Specification Generator☆253Updated 8 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆181Updated last month
- Learning about CUDA by writing PTX code.☆124Updated last year
- Sniff CUDA ioctls☆190Updated last year
- RDNA3 emulator☆52Updated last week
- High-Performance SGEMM on CUDA devices☆86Updated 2 months ago
- ☆72Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆32Updated 8 months ago
- Tutorials on tinygrad☆355Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated this week
- Tenstorrent MLIR compiler☆105Updated this week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆174Updated last year
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆134Updated 7 months ago
- Enabling tinygrad compatibility with the Google Edge TPU☆76Updated 6 months ago
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- ☆437Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆255Updated last week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆84Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- tenstorrent kernel from twitch☆27Updated last year
- Alex Krizhevsky's original code from Google Code☆190Updated 9 years ago
- ☆191Updated this week
- ☆290Updated this week
- ☆86Updated last year
- Scripts and environment for the tinybox☆93Updated 11 months ago
- Can RL solve simple problems?☆54Updated last year
- An implementation of delta-iris in tinygrad☆72Updated 7 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆341Updated last month