KellerJordan / hlb-CIFAR10
Train to 94% on CIFAR-10 in 4.4 seconds on a single A100
☆12Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for hlb-CIFAR10
- Scalable neural net training via automatic normalization in the modular norm.☆119Updated 2 months ago
- Simple Transformer in Jax☆115Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week
- ☆72Updated 4 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆122Updated this week
- seqax = sequence modeling + JAX☆132Updated 3 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆69Updated this week
- ☆49Updated 7 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆112Updated 6 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆80Updated 10 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆177Updated 5 months ago
- ☆20Updated last year
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆69Updated 3 months ago
- LoRA for arbitrary JAX models and functions☆132Updated 8 months ago
- A set of Python scripts that makes your experience on TPU better☆40Updated 4 months ago
- WIP☆89Updated 2 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 6 months ago
- code for training & evaluating Contextual Document Embedding models☆93Updated this week
- ☆197Updated 3 months ago
- ☆53Updated 9 months ago
- ☆27Updated 4 months ago
- ☆76Updated 6 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆93Updated last week
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆48Updated 7 months ago
- 94% on CIFAR-10 in 2.59 seconds 💨 96% in 27 seconds☆168Updated this week
- A library for unit scaling in PyTorch☆105Updated this week
- ☆46Updated last month