notY0rick / cuda_practice
My own repository containing the codes I wrote to practice CUDA programming.
☆29Updated last year
Related projects: ⓘ
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆152Updated 11 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆88Updated 11 months ago
- A really tiny autograd engine☆85Updated 5 months ago
- Tutorials on tinygrad☆157Updated 2 weeks ago
- A puzzle to learn about prompting☆106Updated last year
- Solve puzzles. Learn CUDA.☆53Updated 9 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆91Updated 2 months ago
- could we make an ml stack in 100,000 lines of code?☆22Updated 2 months ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆42Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆18Updated 4 months ago
- Tensor library with autograd using only Rust's standard library☆61Updated 2 months ago
- An interactive exploration of Transformer programming.☆243Updated 10 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆48Updated last month
- ☆47Updated last month
- Puzzles for exploring transformers☆293Updated last year
- Alex Krizhevsky's original code from Google Code☆185Updated 8 years ago
- Introductory lecture on Pytorch☆15Updated 2 years ago
- comma body does a loop around the office☆26Updated 10 months ago
- Highly commented implementations of Transformers in PyTorch☆126Updated last year
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆165Updated last month
- Stream of my favorite papers and links☆34Updated 2 weeks ago
- ☆52Updated last week
- Simple Transformer in Jax☆100Updated 2 months ago
- Solve puzzles to improve your tinygrad skills!☆70Updated last month
- ☆97Updated 5 months ago
- ☆124Updated 7 months ago
- Resources from the EleutherAI Math Reading Group☆50Updated 2 months ago
- Functional local implementations of main model parallelism approaches☆93Updated last year
- LLM training in simple, raw C/CUDA☆17Updated 4 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆77Updated 9 months ago