riverstone496 / awesome-second-order-optimization
☆25Updated last year
Alternatives and similar repositories for awesome-second-order-optimization:
Users that are interested in awesome-second-order-optimization are comparing it to the libraries listed below
- Implementation of PSGD optimizer in JAX☆28Updated last month
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆70Updated 3 months ago
- ☆49Updated last year
- supporting pytorch FSDP for optimizers☆76Updated 2 months ago
- ☆53Updated last year
- ☆75Updated 7 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆60Updated 4 months ago
- ☆52Updated 4 months ago
- ☆158Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆75Updated last year
- WIP☆93Updated 6 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated 2 months ago
- Flow-matching algorithms in JAX☆83Updated 6 months ago
- ☆32Updated 8 months ago
- 🧱 Modula software package☆145Updated this week
- A basic pure pytorch implementation of flash attention☆16Updated 3 months ago
- Stick-breaking attention☆43Updated last month
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆131Updated last week
- Code for https://arxiv.org/abs/2406.04329☆58Updated 2 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆83Updated last week
- Minimal but scalable implementation of large language models in JAX☆32Updated 3 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆57Updated 3 weeks ago
- ☆40Updated 2 months ago
- ☆51Updated 8 months ago
- ☆26Updated last month
- nanoGPT-like codebase for LLM training☆89Updated this week
- ☆31Updated 10 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆119Updated last week
- ☆51Updated 9 months ago