srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,453Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,714Updated last year
- Puzzles for learning Triton☆1,985Updated 9 months ago
- Material for gpu-mode lectures☆5,012Updated 2 weeks ago
- Development repository for the Triton language and compiler☆16,831Updated last week
- Tile primitives for speedy kernels☆2,688Updated this week
- The full minitorch student suite.☆2,162Updated last year
- GPU programming related news and material links☆1,689Updated 8 months ago
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,505Updated this week
- A PyTorch native platform for training generative AI models☆4,395Updated this week
- NanoGPT (124M) in 3 minutes☆3,117Updated 2 months ago
- Machine Learning Engineering Open Book☆15,076Updated this week
- llama3 implementation one matrix multiplication at a time☆15,148Updated last year
- CUDA Templates for Linear Algebra Subroutines☆8,427Updated last week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,093Updated 3 weeks ago
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,724Updated last year
- The official PyTorch implementation of Google's Gemma models☆5,543Updated 3 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆12,736Updated last year
- Implementation for MatMul-free LM.☆3,032Updated last month
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,565Updated this week
- An autoregressive character-level language model for making more things☆3,304Updated last year
- Inference Llama 2 in one file of pure C☆18,735Updated last year
- What would you do with 1000 H100s...☆1,100Updated last year
- This project is a stock trend prediction web application created using Python and Streamlit. The purpose of this web application is to al…☆10Updated 2 years ago
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆22,582Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,720Updated 3 weeks ago
- An ML Systems Onboarding list☆898Updated 7 months ago
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆6,115Updated last week
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,922Updated last year
- Flax is a neural network library for JAX that is designed for flexibility.☆6,793Updated last week
- PyTorch native post-training library☆5,484Updated this week