srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,512Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,727Updated last year
- Material for gpu-mode lectures☆5,110Updated 2 weeks ago
- NanoGPT (124M) in 3 minutes☆3,165Updated 2 months ago
- LLM training in simple, raw C/CUDA☆27,769Updated 3 months ago
- The full minitorch student suite.☆2,192Updated last year
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆12,844Updated last year
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,770Updated last year
- You like pytorch? You like micrograd? You love tinygrad! ❤️☆30,205Updated last week
- CUDA Templates for Linear Algebra Subroutines☆8,527Updated 2 weeks ago
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,583Updated this week
- Machine Learning Engineering Open Book☆15,386Updated last week
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,975Updated last year
- What would you do with 1000 H100s...☆1,108Updated last year
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,821Updated 3 months ago
- Development repository for the Triton language and compiler☆17,154Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,595Updated this week
- An autoregressive character-level language model for making more things☆3,317Updated last year
- A PyTorch native platform for training generative AI models☆4,504Updated this week
- CUDA Python: Performance meets Productivity☆2,988Updated this week
- Flax is a neural network library for JAX that is designed for flexibility.☆6,845Updated this week
- A library to generate LaTeX expression from Python code.☆7,567Updated 7 months ago
- This project is a stock trend prediction web application created using Python and Streamlit. The purpose of this web application is to al…☆10Updated 2 years ago
- Efficient Triton Kernels for LLM Training☆5,714Updated this week
- The n-gram Language Model☆1,447Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,836Updated last month
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆22,669Updated last year
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,904Updated 2 months ago
- An unnecessarily tiny implementation of GPT-2 in NumPy.☆3,412Updated 2 years ago
- From the Tensor to Stable Diffusion, a rough outline for a 1 week course.☆1,069Updated this week
- Understanding Deep Learning - Simon J.D. Prince☆8,297Updated last month