evanatyourservice / kron_torchLinks
An implementation of PSGD Kron second-order optimizer for PyTorch
β97Updated 4 months ago
Alternatives and similar repositories for kron_torch
Users that are interested in kron_torch are comparing it to the libraries listed below
Sorting:
- π§± Modula software packageβ309Updated 3 months ago
- supporting pytorch FSDP for optimizersβ84Updated last year
- Efficient optimizersβ275Updated last month
- β68Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.β186Updated last year
- Getting crystal-like representations with harmonic lossβ192Updated 8 months ago
- β225Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β103Updated 11 months ago
- πSmall Batch Size Training for Language Modelsβ68Updated 2 months ago
- WIPβ93Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ148Updated 2 months ago
- β82Updated last year
- β211Updated last year
- DeMo: Decoupled Momentum Optimizationβ197Updated last year
- β121Updated 6 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- β28Updated 2 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".β139Updated 3 weeks ago
- For optimization algorithm research and development.β548Updated 3 weeks ago
- NanoGPT-speedrunning for the poor T4 enjoyersβ73Updated 7 months ago
- Modular, scalable library to train ML modelsβ176Updated last week
- Scalable and Performant Data Loadingβ349Updated last week
- Dion optimizer algorithmβ395Updated 3 weeks ago
- β91Updated last year
- Focused on fast experimentation and simplicityβ75Updated 11 months ago
- Universal Notation for Tensor Operations in Python.β452Updated 8 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β85Updated 2 months ago
- Supporting code for the blog post on modular manifolds.β104Updated 2 months ago
- β314Updated last year
- β105Updated 4 months ago