dshah3 / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆61Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for GPU-Puzzles
- ☆133Updated 9 months ago
- ☆73Updated 4 months ago
- seqax = sequence modeling + JAX☆133Updated 4 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- ☆197Updated 4 months ago
- ☆82Updated 8 months ago
- A puzzle to learn about prompting☆121Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆107Updated last year
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆53Updated 10 months ago
- Cost aware hyperparameter tuning algorithm☆123Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Puzzles for exploring transformers☆325Updated last year
- A simple library for scaling up JAX programs☆127Updated 2 weeks ago
- JAX implementation of the Llama 2 model☆210Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week
- Fast bare-bones BPE for modern tokenizer training☆142Updated last month
- ☆27Updated 4 months ago
- ☆224Updated 4 months ago
- ☆292Updated 4 months ago
- WIP☆89Updated 3 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- An interactive exploration of Transformer programming.☆246Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆211Updated 3 months ago
- Scalable neural net training via automatic normalization in the modular norm.☆121Updated 3 months ago
- ML/DL Math and Method notes☆57Updated 11 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- ☆152Updated this week
- ☆128Updated this week