srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,834Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,851Updated last year
- Material for gpu-mode lectures☆5,432Updated 2 weeks ago
- Machine Learning Engineering Open Book☆16,071Updated this week
- Puzzles for learning Triton☆2,187Updated last year
- The full minitorch student suite.☆2,260Updated last year
- GPU programming related news and material links☆1,874Updated 3 months ago
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,993Updated last year
- Tile primitives for speedy kernels☆3,008Updated 2 weeks ago
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,945Updated this week
- Development repository for the Triton language and compiler☆17,861Updated last week
- LLM training in simple, raw C/CUDA☆28,414Updated 5 months ago
- Inference Llama 2 in one file of pure C☆19,032Updated last year
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆8,991Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,169Updated 4 months ago
- What would you do with 1000 H100s...☆1,133Updated last year
- CUDA Python: Performance meets Productivity☆3,098Updated this week
- NanoGPT (124M) in 3 minutes☆3,974Updated this week
- A PyTorch native platform for training generative AI models☆4,866Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,819Updated last week
- Implementation for MatMul-free LM.☆3,042Updated 3 weeks ago
- Tensor library for machine learning☆13,743Updated last week
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,923Updated 2 months ago
- ☆4,109Updated last year
- An ML Systems Onboarding list☆957Updated 11 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,623Updated 3 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,923Updated 3 months ago
- A computer science textbook☆4,545Updated last year
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,217Updated last year
- PyTorch native quantization and sparsity for training and inference☆2,576Updated last week
- Understanding Deep Learning - Simon J.D. Prince☆8,609Updated last week