srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,915Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,901Updated last year
- Material for gpu-mode lectures☆5,640Updated last month
- Machine Learning Engineering Open Book☆16,548Updated last week
- The full minitorch student suite.☆2,287Updated last year
- A Python framework for accelerated simulation, data generation and spatial computing.☆6,162Updated this week
- NanoGPT (124M) in 2 minutes☆4,515Updated this week
- Development repository for the Triton language and compiler☆18,319Updated this week
- What would you do with 1000 H100s...☆1,148Updated 2 years ago
- A PyTorch native platform for training generative AI models☆5,023Updated this week
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,855Updated 7 months ago
- Fast and memory-efficient exact attention☆21,957Updated this week
- CUDA Python: Performance meets Productivity☆3,149Updated this week
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆6,516Updated last week
- ☆2,866Updated 3 months ago
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,185Updated last week
- llama3 implementation one matrix multiplication at a time☆15,240Updated last year
- Schedule-Free Optimization in PyTorch☆2,254Updated 8 months ago
- Neural Networks: Zero to Hero☆20,079Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,700Updated 3 weeks ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,180Updated 5 months ago
- ☆4,113Updated last year
- Pen and paper exercises in machine learning☆2,577Updated last year
- PyTorch native post-training library☆5,660Updated this week
- PyTorch native quantization and sparsity for training and inference☆2,657Updated this week
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆14,478Updated last year
- Efficient Triton Kernels for LLM Training☆6,092Updated this week
- The Art of Debugging Open Book☆1,276Updated 2 weeks ago
- LLM training in simple, raw C/CUDA☆28,763Updated 7 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,033Updated 5 months ago
- A deep-dive on the entire history of deep-learning☆1,514Updated last year