KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆182Updated 2 weeks ago
Alternatives and similar repositories for grow-crystals:
Users that are interested in grow-crystals are comparing it to the libraries listed below
- An implementation of PSGD Kron second-order optimizer for PyTorch☆86Updated 2 weeks ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- DeMo: Decoupled Momentum Optimization☆186Updated 4 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆180Updated 7 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- Efficient optimizers☆188Updated last week
- ☆93Updated 2 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆229Updated last month
- 🧱 Modula software package☆188Updated 3 weeks ago
- Focused on fast experimentation and simplicity☆71Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆151Updated 5 months ago
- ☆107Updated 3 months ago
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster☆64Updated this week
- NanoGPT-speedrunning for the poor T4 enjoyers☆60Updated last week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆95Updated last month
- ☆173Updated 4 months ago
- ☆302Updated 9 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆73Updated 3 weeks ago
- Normalized Transformer (nGPT)☆167Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆104Updated 4 months ago
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆555Updated 9 months ago
- ☆36Updated 4 months ago
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 8 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆109Updated 4 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆217Updated 2 weeks ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆135Updated last month
- ☆134Updated last week
- ☆79Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆279Updated 3 weeks ago
- An extension of the nanoGPT repository for training small MOE models.☆123Updated last month