fattorib / ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
☆13Updated last year
Alternatives and similar repositories for ZeRO-transformer:
Users that are interested in ZeRO-transformer are comparing it to the libraries listed below
- Experiment of using Tangent to autodiff triton☆74Updated 11 months ago
- ☆75Updated 6 months ago
- A simple library for scaling up JAX programs☆129Updated 2 months ago
- Minimal but scalable implementation of large language models in JAX☆28Updated 2 months ago
- seqax = sequence modeling + JAX☆136Updated 6 months ago
- JAX bindings for Flash Attention v2☆83Updated 6 months ago
- If it quacks like a tensor...☆55Updated 2 months ago
- Experimenting with how best to do multi-host dataloading☆10Updated 2 years ago
- ☆181Updated 3 weeks ago
- A library for unit scaling in PyTorch☆118Updated last month
- Inference code for LLaMA models in JAX☆114Updated 7 months ago
- Machine Learning eXperiment Utilities☆45Updated 7 months ago
- ☆58Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- ring-attention experiments☆116Updated 3 months ago
- ☆201Updated 6 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆219Updated 5 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- A set of Python scripts that makes your experience on TPU better☆44Updated 6 months ago
- Jax/Flax rewrite of Karpathy's nanoGPT☆54Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆67Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆192Updated last month
- ☆275Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆90Updated last month
- Accelerated First Order Parallel Associative Scan☆169Updated 4 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆91Updated this week
- Train very large language models in Jax.☆198Updated last year
- ☆37Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month