srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,442Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,706Updated last year
- Material for gpu-mode lectures☆4,991Updated last week
- The full minitorch student suite.☆2,158Updated last year
- Machine Learning Engineering Open Book☆14,957Updated this week
- llama3 implementation one matrix multiplication at a time☆15,123Updated last year
- Puzzles for learning Triton☆1,972Updated 9 months ago
- GPU programming related news and material links☆1,672Updated 8 months ago
- LLM training in simple, raw C/CUDA☆27,536Updated 2 months ago
- NanoGPT (124M) in 3 minutes☆3,074Updated last month
- Development repository for the Triton language and compiler☆16,769Updated this week
- "Probabilistic Machine Learning" - a book series by Kevin Murphy☆5,326Updated 4 months ago
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,699Updated last year
- ☆1,403Updated 2 months ago
- Video+code lecture on building nanoGPT from scratch☆4,331Updated last year
- Tile primitives for speedy kernels☆2,650Updated last week
- A PyTorch native platform for training generative AI models☆4,339Updated this week
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆12,703Updated last year
- CUDA Python: Performance meets Productivity☆2,949Updated last week
- Explanation to key concepts in ML☆8,060Updated 2 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,908Updated last year
- Language model alignment-focused deep learning curriculum☆1,456Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,669Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆43,960Updated 8 months ago
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆6,039Updated 2 weeks ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,071Updated 2 weeks ago
- Schedule-Free Optimization in PyTorch☆2,203Updated 3 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,715Updated last week
- Understanding Deep Learning - Simon J.D. Prince☆7,851Updated last week
- An autoregressive character-level language model for making more things☆3,284Updated last year
- An unnecessarily tiny implementation of GPT-2 in NumPy.☆3,403Updated 2 years ago