srush / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆10,335Updated 4 months ago
Alternatives and similar repositories for GPU-Puzzles:
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
- Solve puzzles. Improve your pytorch.☆3,359Updated 6 months ago
- Machine Learning Engineering Open Book☆12,353Updated this week
- Material for gpu-mode lectures☆3,501Updated last week
- Puzzles for learning Triton☆1,300Updated last month
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆7,340Updated 4 months ago
- The full minitorch student suite.☆1,985Updated 5 months ago
- A PyTorch native library for large model training☆3,091Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,462Updated this week
- GPU programming related news and material links☆1,312Updated last week
- Schedule-Free Optimization in PyTorch☆2,061Updated last month
- llama3 implementation one matrix multiplication at a time☆14,030Updated 7 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆11,197Updated this week
- You like pytorch? You like micrograd? You love tinygrad! ❤️☆27,548Updated this week
- LLM training in simple, raw C/CUDA☆25,047Updated 3 months ago
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆8,655Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,749Updated last month
- Development repository for the Triton language and compiler☆14,042Updated this week
- Fast and memory-efficient exact attention☆15,064Updated this week
- Efficient Triton Kernels for LLM Training☆4,183Updated this week
- A concise but complete full-attention transformer with a set of promising experimental features from various papers☆4,985Updated last week
- CoreNet: A library for training deep neural networks☆6,990Updated 3 months ago
- Tile primitives for speedy kernels☆1,923Updated this week
- NanoGPT (124M) in 3.4 minutes☆2,068Updated last week
- ☆4,050Updated 7 months ago
- PyTorch native post-training library☆4,703Updated this week
- Implementation for MatMul-free LM.☆2,941Updated 2 months ago
- Sparsity-aware deep learning inference runtime for CPUs☆3,077Updated 5 months ago
- What would you do with 1000 H100s...☆948Updated last year
- Explanation to key concepts in ML☆7,405Updated this week
- The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.☆11,633Updated last week