dshah3 / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆63Updated last year
Alternatives and similar repositories for GPU-Puzzles:
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
- ☆87Updated last year
- ☆76Updated 9 months ago
- ☆153Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆178Updated last year
- seqax = sequence modeling + JAX☆153Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆129Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆123Updated 11 months ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- A really tiny autograd engine☆91Updated last year
- Cost aware hyperparameter tuning algorithm☆150Updated 9 months ago
- Puzzles for exploring transformers☆342Updated last year
- ☆215Updated 9 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆60Updated 3 years ago
- ☆428Updated 5 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆109Updated 3 months ago
- Simple Transformer in Jax☆136Updated 9 months ago
- ☆98Updated last week
- A puzzle to learn about prompting☆126Updated last year
- Minimal but scalable implementation of large language models in JAX☆34Updated 5 months ago
- ☆53Updated last year
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated last year
- JAX implementation of the Llama 2 model☆217Updated last year
- ☆166Updated 2 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆270Updated 10 months ago
- Efficient optimizers☆188Updated this week
- A set of Python scripts that makes your experience on TPU better☆50Updated 9 months ago
- ring-attention experiments☆129Updated 5 months ago
- WIP☆93Updated 8 months ago
- ☆32Updated 10 months ago