KellerJordan / hlb-CIFAR10
Train to 94% on CIFAR-10 in 4.4 seconds on a single A100
☆12Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for hlb-CIFAR10
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆121Updated this week
- Simple Transformer in Jax☆119Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- A set of Python scripts that makes your experience on TPU better☆40Updated 4 months ago
- ☆73Updated 4 months ago
- seqax = sequence modeling + JAX☆134Updated 4 months ago
- 94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds☆178Updated 2 weeks ago
- Solve puzzles. Learn CUDA.☆61Updated 11 months ago
- Scalable neural net training via automatic normalization in the modular norm.☆122Updated this week
- ☆129Updated last week
- WIP☆89Updated 3 months ago
- ☆27Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- A really tiny autograd engine☆87Updated 7 months ago
- Normalized Transformer (nGPT)☆94Updated this week
- ☆53Updated 10 months ago
- ☆22Updated last year
- Accelerated First Order Parallel Associative Scan☆164Updated 3 months ago
- ☆57Updated 2 years ago
- Simplex Random Feature attention, in PyTorch☆71Updated last year
- Efficient optimizers☆87Updated this week
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆80Updated 11 months ago
- train with kittens!☆49Updated 3 weeks ago
- ☆20Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆180Updated 5 months ago
- ☆49Updated 8 months ago
- ☆83Updated 8 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- smolLM with Entropix sampler on pytorch☆141Updated 3 weeks ago
- Experiments for efforts to train a new and improved t5☆76Updated 7 months ago