riverstone496 / awesome-second-order-optimizationLinks
β27Updated 3 months ago
Alternatives and similar repositories for awesome-second-order-optimization
Users that are interested in awesome-second-order-optimization are comparing it to the libraries listed below
Sorting:
- Supporting code for the blog post on modular manifolds.β110Updated 3 months ago
- πSmall Batch Size Training for Language Modelsβ80Updated 3 months ago
- A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variβ¦β128Updated 3 months ago
- β62Updated last year
- β123Updated 7 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β87Updated last year
- WIPβ93Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorchβ98Updated 5 months ago
- Code for the paper "Function-Space Learning Rates"β23Updated 7 months ago
- supporting pytorch FSDP for optimizersβ84Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β86Updated 4 months ago
- Deep Networks Grok All the Time and Here is Whyβ38Updated last year
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880β242Updated 2 weeks ago
- Flash Attention Triton kernel with support for second-order derivativesβ131Updated last month
- β35Updated last year
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token predictionβ82Updated 7 months ago
- Flow-matching algorithms in JAXβ114Updated last year
- π§± Modula software packageβ321Updated 5 months ago
- β38Updated last year
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion modβ¦β119Updated 2 weeks ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ150Updated 3 months ago
- β127Updated this week
- β92Updated last year
- β104Updated 10 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".β218Updated 2 months ago
- β37Updated 4 months ago
- β268Updated 7 months ago
- Experiment of using Tangent to autodiff tritonβ81Updated 2 years ago
- Dion optimizer algorithmβ419Updated last week
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"β112Updated 7 months ago