srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,583Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,745Updated last year
- Machine Learning Engineering Open Book☆15,450Updated last week
- The full minitorch student suite.☆2,205Updated last year
- Material for gpu-mode lectures☆5,197Updated last month
- Puzzles for learning Triton☆2,036Updated 11 months ago
- GPU programming related news and material links☆1,741Updated last month
- Tile primitives for speedy kernels☆2,821Updated last week
- Development repository for the Triton language and compiler☆17,289Updated this week
- LLM training in simple, raw C/CUDA☆27,923Updated 3 months ago
- What would you do with 1000 H100s...☆1,113Updated last year
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,661Updated this week
- The Art of Debugging☆957Updated last month
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,222Updated 2 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,064Updated last year
- NanoGPT (124M) in 3 minutes☆3,565Updated last week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,128Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,856Updated last month
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,824Updated 4 months ago
- A concise but complete full-attention transformer with a set of promising experimental features from various papers☆5,633Updated this week
- Fast and memory-efficient exact attention☆20,023Updated this week
- Language model alignment-focused deep learning curriculum☆1,487Updated last year
- Schedule-Free Optimization in PyTorch☆2,224Updated 5 months ago
- Inference Llama 2 in one file of pure C☆18,872Updated last year
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,610Updated this week
- Pen and paper exercises in machine learning☆2,499Updated last year
- ☆1,623Updated last week
- ☆4,100Updated last year
- CUDA Python: Performance meets Productivity☆3,006Updated last week
- llama3 implementation one matrix multiplication at a time☆15,172Updated last year
- It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; ho…☆4,708Updated 2 months ago