KellerJordan / cifar10-airbench
94% on CIFAR-10 in 2.6 seconds π¨ 96% in 27 seconds
β177Updated last week
Related projects β
Alternatives and complementary repositories for cifar10-airbench
- β128Updated this week
- Scalable neural net training via automatic normalization in the modular norm.β121Updated 3 months ago
- WIPβ89Updated 3 months ago
- Efficient optimizersβ79Updated this week
- seqax = sequence modeling + JAXβ133Updated 4 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overheadβ109Updated last week
- The AdEMAMix Optimizer: Better, Faster, Older.β172Updated 2 months ago
- β197Updated 4 months ago
- Ο-GPT: A New Approach to Autoregressive Modelsβ59Updated 3 months ago
- Accelerated First Order Parallel Associative Scanβ163Updated 3 months ago
- Annotated version of the Mamba paperβ457Updated 8 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adamβ69Updated 3 months ago
- For optimization algorithm research and development.β449Updated this week
- β73Updated 4 months ago
- Universal Tensor Operations in Einstein-Inspired Notation for Python.β328Updated last month
- Normalized Transformer (nGPT)β66Updated this week
- A library for unit scaling in PyTorchβ105Updated 2 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ112Updated 2 months ago
- Experiment of using Tangent to autodiff tritonβ72Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.β162Updated 6 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ113Updated 7 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of newβ¦β119Updated 3 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β178Updated 5 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β214Updated 3 months ago
- A repository for log-time feedforward networksβ216Updated 7 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Coresβ281Updated last month
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ95Updated 2 weeks ago
- β228Updated 2 months ago
- LoRA for arbitrary JAX models and functionsβ132Updated 8 months ago
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).β179Updated this week