evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
โ86Updated 2 weeks ago
Alternatives and similar repositories for kron_torch:
Users that are interested in kron_torch are comparing it to the libraries listed below
- ๐งฑ Modula software packageโ187Updated 2 weeks ago
- supporting pytorch FSDP for optimizersโ80Updated 4 months ago
- Efficient optimizersโ186Updated this week
- โ173Updated 4 months ago
- WIPโ93Updated 7 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ98Updated 3 months ago
- โ79Updated 11 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesโ135Updated last month
- โ76Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ103Updated 4 months ago
- โ59Updated 4 months ago
- Focused on fast experimentation and simplicityโ71Updated 3 months ago
- Getting crystal-like representations with harmonic lossโ180Updated last week
- NanoGPT-speedrunning for the poor T4 enjoyersโ57Updated last week
- DeMo: Decoupled Momentum Optimizationโ185Updated 4 months ago
- โ92Updated 2 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizersโ88Updated 8 months ago
- ฯ-GPT: A New Approach to Autoregressive Modelsโ62Updated 7 months ago
- โ150Updated 7 months ago
- Scalable and Performant Data Loadingโ234Updated this week
- The AdEMAMix Optimizer: Better, Faster, Older.โ180Updated 7 months ago
- research impl of Native Sparse Attention (2502.11089)โ53Updated last month
- โ53Updated last year
- A MAD laboratory to improve AI architecture designs ๐งชโ109Updated 3 months ago
- โ49Updated last year
- โ25Updated last year
- Universal Tensor Operations in Einstein-Inspired Notation for Python.โ365Updated this week
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adamโ75Updated 8 months ago
- โ215Updated 8 months ago
- A simple library for scaling up JAX programsโ134Updated 5 months ago