fattorib / ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
☆12Updated last year
Related projects: ⓘ
- LoRA for arbitrary JAX models and functions☆127Updated 6 months ago
- A simple library for scaling up JAX programs☆116Updated last month
- ☆56Updated 2 years ago
- seqax = sequence modeling + JAX☆129Updated 2 months ago
- JAX bindings for Flash Attention v2☆76Updated 2 months ago
- Machine Learning eXperiment Utilities☆42Updated 3 months ago
- ☆27Updated this week
- ☆68Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆34Updated 2 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- If it quacks like a tensor...☆48Updated 7 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆94Updated this week
- ☆33Updated 8 months ago
- ☆172Updated this week
- Inference code for LLaMA models in JAX☆108Updated 4 months ago
- Experiment of using Tangent to autodiff triton☆66Updated 7 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- Train very large language models in Jax.☆191Updated 10 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated last month
- JMP is a Mixed Precision library for JAX.☆183Updated 4 months ago
- ☆28Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆25Updated 3 weeks ago
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- ☆180Updated 2 months ago
- Minimal but scalable implementation of large language models in JAX☆17Updated 3 weeks ago
- A port of muP to JAX/Haiku☆25Updated last year
- Jax/Flax rewrite of Karpathy's nanoGPT☆46Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆74Updated 7 months ago
- ☆42Updated 3 months ago