riverstone496 / awesome-second-order-optimizationLinks
β28Updated last month
Alternatives and similar repositories for awesome-second-order-optimization
Users that are interested in awesome-second-order-optimization are comparing it to the libraries listed below
Sorting:
- πSmall Batch Size Training for Language Modelsβ63Updated 3 weeks ago
- π§± Modula software packageβ299Updated 2 months ago
- Supporting code for the blog post on modular manifolds.β94Updated last month
- An implementation of PSGD Kron second-order optimizer for PyTorchβ96Updated 3 months ago
- β120Updated 4 months ago
- β220Updated 11 months ago
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"β122Updated 6 months ago
- β60Updated last year
- supporting pytorch FSDP for optimizersβ83Updated 10 months ago
- Dion optimizer algorithmβ374Updated last month
- Deep Networks Grok All the Time and Here is Whyβ37Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β85Updated last month
- β252Updated 4 months ago
- β91Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ321Updated 3 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ147Updated last month
- Code for the paper "Function-Space Learning Rates"β23Updated 4 months ago
- WIPβ93Updated last year
- Explorations into the recently proposed Taylor Series Linear Attentionβ99Updated last year
- Flow-matching algorithms in JAXβ106Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]β68Updated last year
- Experiment of using Tangent to autodiff tritonβ80Updated last year
- β95Updated 8 months ago
- A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variβ¦β111Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β170Updated 4 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditionβ¦β188Updated 2 weeks ago
- A simple library for scaling up JAX programsβ144Updated last year
- β44Updated 2 months ago
- Implementation of Diffusion Transformer (DiT) in JAXβ294Updated last year
- Flash Attention Triton kernel with support for second-order derivativesβ106Updated last week