☆262Dec 2, 2024Updated last year
Alternatives and similar repositories for SOAP
Users that are interested in SOAP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Efficient optimizers☆310Apr 4, 2026Updated last week
- ☆70Nov 15, 2024Updated last year
- For optimization algorithm research and development.☆563Updated this week
- ☆32Mar 14, 2025Updated last year
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆194Apr 3, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- WIP☆95Aug 13, 2024Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆99Jul 24, 2025Updated 8 months ago
- ☆10Jun 27, 2024Updated last year
- Focused on fast experimentation and simplicity☆80Dec 24, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,271May 21, 2025Updated 10 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆455May 13, 2025Updated 11 months ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆60Updated this week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)☆17Mar 6, 2025Updated last year
- Muon is an optimizer for hidden layers in neural networks☆2,479Jan 19, 2026Updated 2 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Jul 24, 2025Updated 8 months ago
- ☆15Mar 2, 2025Updated last year
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 10 months ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆65Mar 11, 2025Updated last year
- ☆19Dec 4, 2025Updated 4 months ago
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆435Dec 12, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🧱 Modula software package☆326Aug 18, 2025Updated 7 months ago
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- ☆54May 20, 2024Updated last year
- ☆63Oct 3, 2024Updated last year
- ☆22Nov 9, 2024Updated last year
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 9 months ago
- ☆173Apr 7, 2026Updated last week
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆66Nov 18, 2025Updated 4 months ago
- ☆13Apr 1, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- Minimal but scalable implementation of large language models in JAX☆35Nov 28, 2025Updated 4 months ago
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆70Mar 3, 2026Updated last month
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 5 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆132Jun 24, 2025Updated 9 months ago
- ☆124Jun 11, 2025Updated 10 months ago
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 5 months ago