evanatyourservice / kron_torchLinks
An implementation of PSGD Kron second-order optimizer for PyTorch
β96Updated last month
Alternatives and similar repositories for kron_torch
Users that are interested in kron_torch are comparing it to the libraries listed below
Sorting:
- supporting pytorch FSDP for optimizersβ84Updated 8 months ago
- π§± Modula software packageβ225Updated last week
- Efficient optimizersβ254Updated 3 weeks ago
- Scalable and Performant Data Loadingβ291Updated this week
- β207Updated 8 months ago
- Dion optimizer algorithmβ305Updated this week
- DeMo: Decoupled Momentum Optimizationβ190Updated 8 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.β185Updated 11 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β101Updated 8 months ago
- Getting crystal-like representations with harmonic lossβ194Updated 4 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".β131Updated 2 weeks ago
- β82Updated last year
- β65Updated 9 months ago
- β115Updated 2 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ143Updated 3 months ago
- β150Updated last year
- WIPβ94Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β153Updated 2 months ago
- FlashRNN - Fast RNN Kernels with I/O Awarenessβ94Updated 2 months ago
- πSmall Batch Size Training for Language Modelsβ43Updated this week
- β27Updated last year
- β87Updated last year
- For optimization algorithm research and development.β530Updated this week
- β101Updated last month
- Focused on fast experimentation and simplicityβ76Updated 8 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ284Updated last month
- Explorations into the recently proposed Taylor Series Linear Attentionβ100Updated last year
- β56Updated 10 months ago
- β307Updated last year
- Fast, Modern, and Low Precision PyTorch Optimizersβ108Updated 3 weeks ago