evanatyourservice / kron_torchLinks
An implementation of PSGD Kron second-order optimizer for PyTorch
☆95Updated 2 months ago
Alternatives and similar repositories for kron_torch
Users that are interested in kron_torch are comparing it to the libraries listed below
Sorting:
- Getting crystal-like representations with harmonic loss☆194Updated 6 months ago
- supporting pytorch FSDP for optimizers☆84Updated 10 months ago
- 🧱 Modula software package☆277Updated last month
- Efficient optimizers☆265Updated this week
- ☆120Updated 3 months ago
- WIP☆93Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- ☆67Updated 10 months ago
- ☆215Updated 10 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆102Updated 9 months ago
- ☆150Updated last year
- Dion optimizer algorithm☆360Updated last week
- Focused on fast experimentation and simplicity☆75Updated 9 months ago
- Scalable and Performant Data Loading☆304Updated 2 weeks ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆133Updated last month
- ☆91Updated last year
- ☆82Updated last year
- For optimization algorithm research and development.☆539Updated last week
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- DeMo: Decoupled Momentum Optimization☆193Updated 10 months ago
- Supporting code for the blog post on modular manifolds.☆71Updated last week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆146Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆164Updated 3 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆98Updated 3 months ago
- Normalized Transformer (nGPT)☆190Updated 10 months ago
- Universal Notation for Tensor Operations in Python.☆433Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆303Updated 2 months ago
- research impl of Native Sparse Attention (2502.11089)☆61Updated 7 months ago
- ☆28Updated last week