srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,355Updated 11 months ago
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,671Updated last year
- Machine Learning Engineering Open Book☆14,734Updated 2 weeks ago
- Material for gpu-mode lectures☆4,842Updated last month
- The full minitorch student suite.☆2,143Updated 11 months ago
- Puzzles for learning Triton☆1,832Updated 8 months ago
- GPU programming related news and material links☆1,652Updated 7 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆12,528Updated last year
- A PyTorch native platform for training generative AI models☆4,240Updated this week
- NanoGPT (124M) in 3 minutes☆3,025Updated 3 weeks ago
- Inference Llama 2 in one file of pure C☆18,626Updated last year
- llama3 implementation one matrix multiplication at a time☆15,097Updated last year
- Tile primitives for speedy kernels☆2,570Updated last week
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,406Updated this week
- Development repository for the Triton language and compiler☆16,484Updated last week
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,641Updated 11 months ago
- An autoregressive character-level language model for making more things☆3,237Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,672Updated 3 weeks ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,836Updated last year
- What would you do with 1000 H100s...☆1,083Updated last year
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆22,410Updated last year
- LLM training in simple, raw C/CUDA☆27,349Updated last month
- PyTorch native post-training library☆5,399Updated last week
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,115Updated this week
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,899Updated last month
- CUDA Templates for Linear Algebra Subroutines☆8,222Updated this week
- Neural Networks: Zero to Hero☆15,381Updated 11 months ago
- The n-gram Language Model☆1,436Updated last year
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,809Updated last month
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,654Updated last month
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,044Updated 4 months ago