dshah3 / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆64Updated last year
Alternatives and similar repositories for GPU-Puzzles:
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
- ☆78Updated 10 months ago
- ☆155Updated last year
- seqax = sequence modeling + JAX☆155Updated 3 weeks ago
- ☆88Updated last year
- A puzzle to learn about prompting☆127Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- ☆430Updated 6 months ago
- ☆60Updated 3 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆123Updated last year
- ☆217Updated 9 months ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Puzzles for exploring transformers☆344Updated 2 years ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆180Updated last year
- A simple library for scaling up JAX programs☆134Updated 6 months ago
- ☆105Updated this week
- Minimal but scalable implementation of large language models in JAX☆34Updated 6 months ago
- A really tiny autograd engine☆92Updated last year
- A set of Python scripts that makes your experience on TPU better☆52Updated 10 months ago
- ring-attention experiments☆132Updated 6 months ago
- Simple Transformer in Jax☆136Updated 10 months ago
- WIP☆93Updated 8 months ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 6 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆272Updated 10 months ago
- prime-rl is a codebase for decentralized RL training at scale☆85Updated this week
- JAX implementation of the Llama 2 model☆218Updated last year
- Implementation of Flash Attention in Jax☆206Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆92Updated 5 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆82Updated last year