riverstone496 / awesome-second-order-optimizationLinks
β28Updated 4 months ago
Alternatives and similar repositories for awesome-second-order-optimization
Users that are interested in awesome-second-order-optimization are comparing it to the libraries listed below
Sorting:
- Supporting code for the blog post on modular manifolds.β115Updated 4 months ago
- πSmall Batch Size Training for Language Modelsβ80Updated 4 months ago
- β62Updated last year
- supporting pytorch FSDP for optimizersβ84Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorchβ98Updated 6 months ago
- β246Updated last year
- β147Updated this week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β89Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β86Updated 4 months ago
- Dion optimizer algorithmβ431Updated 3 weeks ago
- Code for the paper "Function-Space Learning Rates"β25Updated 8 months ago
- Official Implementation of Dynamic erf (Derf).β127Updated last month
- β35Updated last year
- π§± Modula software packageβ322Updated 5 months ago
- β124Updated 7 months ago
- β39Updated last year
- Stick-breaking attentionβ62Updated 7 months ago
- β270Updated 8 months ago
- β39Updated 5 months ago
- Explorations into the recently proposed Taylor Series Linear Attentionβ100Updated last year
- Experiment of using Tangent to autodiff tritonβ82Updated 2 years ago
- β92Updated last year
- Flash Attention Triton kernel with support for second-order derivativesβ144Updated this week
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]β70Updated last year
- WIPβ93Updated last year
- Deep Networks Grok All the Time and Here is Whyβ38Updated last year
- A MAD laboratory to improve AI architecture designs π§ͺβ137Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β186Updated 3 weeks ago
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"β130Updated this week
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion modβ¦β120Updated last month