cverrier / tinygrad-tutosLinks
Tutorials about tinygrad, an end-to-end deep learning stack
☆59Updated this week
Alternatives and similar repositories for tinygrad-tutos
Users that are interested in tinygrad-tutos are comparing it to the libraries listed below
Sorting:
- could we make an ml stack in 100,000 lines of code?☆46Updated last year
- Visualization of cache-optimized matrix multiplication☆157Updated 9 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆277Updated last year
- Tutorials on tinygrad☆445Updated 2 months ago
- Solve puzzles to improve your tinygrad skills!☆171Updated 2 months ago
- Tensor library with autograd using only Rust's standard library☆70Updated last year
- parallelized hyperdimensional tictactoe☆126Updated last year
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆192Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆371Updated 8 months ago
- Learning about CUDA by writing PTX code.☆150Updated last year
- Learnings and programs related to CUDA☆432Updated 6 months ago
- Nvidia Instruction Set Specification Generator☆306Updated last year
- ☆96Updated last year
- (WIP) A small but powerful, homemade PyTorch from scratch.☆662Updated last week
- ☆97Updated last week
- Following Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆172Updated last year
- Gradient descent is cool and all, but what if we could delete it?☆104Updated 4 months ago
- Simple Transformer in Jax☆141Updated last year
- SIMD quantization kernels☆93Updated 3 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 9 months ago
- An implement of deep learning framework and models in C☆48Updated 8 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆175Updated 11 months ago
- Quantized LLM training in pure CUDA/C++.☆226Updated this week
- throwaway GPT inference☆141Updated last year
- ☆274Updated 3 months ago
- ☆536Updated 4 months ago
- noise_step: Training in 1.58b With No Gradient Memory☆220Updated last year
- Simple MPI implementation for prototyping or learning☆296Updated 4 months ago
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆68Updated 8 months ago