evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for psgd_jax
- ☆16Updated 2 months ago
- seqax = sequence modeling + JAX☆132Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- Automatically take good care of your preemptible TPUs☆31Updated last year
- ☆53Updated 9 months ago
- A simple library for scaling up JAX programs☆125Updated last week
- ☆36Updated 10 months ago
- ☆72Updated 4 months ago
- Scalable neural net training via automatic normalization in the modular norm.☆119Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆112Updated 6 months ago
- ☆50Updated 5 months ago
- Minimal but scalable implementation of large language models in JAX☆25Updated last week
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- LoRA for arbitrary JAX models and functions☆132Updated 8 months ago
- If it quacks like a tensor...☆52Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated 2 weeks ago
- ☆197Updated 3 months ago
- ☆18Updated last month
- ☆27Updated 7 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆69Updated this week
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- ☆46Updated last month
- ☆46Updated last month
- ☆122Updated this week
- ☆19Updated 6 months ago
- Efficient PScan implementation in PyTorch☆15Updated 10 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 2 months ago
- The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"☆14Updated 3 months ago
- Train vision models using JAX and 🤗 transformers☆95Updated 3 weeks ago