riverstone496 / awesome-second-order-optimizationLinks
β27Updated last year
Alternatives and similar repositories for awesome-second-order-optimization
Users that are interested in awesome-second-order-optimization are comparing it to the libraries listed below
Sorting:
- πSmall Batch Size Training for Language Modelsβ41Updated this week
- An implementation of PSGD Kron second-order optimizer for PyTorchβ94Updated 2 weeks ago
- β206Updated 8 months ago
- supporting pytorch FSDP for optimizersβ84Updated 8 months ago
- WIPβ94Updated 11 months ago
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"β113Updated 3 months ago
- π§± Modula software packageβ216Updated last week
- Implementations of attention with the softpick function, naive and FlashAttention-2β81Updated 3 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ143Updated 2 months ago
- β115Updated last month
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ193Updated 4 months ago
- Code for the paper "Function-Space Learning Rates"β23Updated 2 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ275Updated 3 weeks ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"β97Updated last month
- β53Updated 10 months ago
- β51Updated last year
- β79Updated 5 months ago
- Normalized Transformer (nGPT)β185Updated 8 months ago
- A basic pure pytorch implementation of flash attentionβ16Updated 9 months ago
- Stick-breaking attentionβ59Updated last month
- Explorations into the recently proposed Taylor Series Linear Attentionβ100Updated 11 months ago
- β232Updated 2 months ago
- β83Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ130Updated last year
- Flow-matching algorithms in JAXβ100Updated 11 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β82Updated 3 weeks ago
- β65Updated 8 months ago
- Dion optimizer algorithmβ193Updated this week
- Implementation of Diffusion Transformer (DiT) in JAXβ286Updated last year
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"β118Updated last month