KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆177Updated last month
Alternatives and similar repositories for grow-crystals:
Users that are interested in grow-crystals are comparing it to the libraries listed below
- DeMo: Decoupled Momentum Optimization☆185Updated 3 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆84Updated last month
- ☆91Updated 2 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 4 months ago
- Efficient optimizers☆184Updated 2 weeks ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 2 weeks ago
- Focused on fast experimentation and simplicity☆69Updated 3 months ago
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆179Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆221Updated 3 weeks ago
- ☆169Updated 3 months ago
- ☆36Updated 3 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆135Updated 2 weeks ago
- ☆79Updated 11 months ago
- Visualizations of the theory behind diffusion models.☆151Updated 11 months ago
- look how they massacred my boy☆63Updated 5 months ago
- ☆119Updated 3 weeks ago
- A MAD laboratory to improve AI architecture designs 🧪☆108Updated 3 months ago
- Simple Transformer in Jax☆136Updated 9 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆214Updated last month
- ☆149Updated 7 months ago
- ☆105Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆276Updated this week
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster☆63Updated last month
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- Muon optimizer: +>30% sample efficiency with <3% wallclock overhead☆521Updated 2 weeks ago
- ☆301Updated 9 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated this week
- 🧱 Modula software package☆173Updated 2 weeks ago