D-Adaptation for SGD, Adam and AdaGrad
☆529Jan 22, 2025Updated last year
Alternatives and similar repositories for dadaptation
Users that are interested in dadaptation are comparing it to the libraries listed below
Sorting:
- The Prodigy optimizer and its variants for training neural networks.☆450Jan 16, 2025Updated last year
- Schedule-Free Optimization in PyTorch☆2,257May 21, 2025Updated 9 months ago
- ☆36Jan 23, 2024Updated 2 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Oct 18, 2023Updated 2 years ago
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆981Jan 30, 2024Updated 2 years ago
- Euclidean Wasserstein-2 optimal transportation☆46Aug 19, 2023Updated 2 years ago
- ☆213Oct 10, 2022Updated 3 years ago
- maximal update parametrization (µP)☆1,686Jul 17, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆10,353Feb 20, 2026Updated last week
- Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.☆381Jun 4, 2024Updated last year
- FFCV-SSL Fast Forward Computer Vision for Self-Supervised Learning.☆212Aug 1, 2023Updated 2 years ago
- Named tensors with first-class dimensions for PyTorch☆332Jun 14, 2023Updated 2 years ago
- A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.☆1,249Updated this week
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 7 months ago
- TensorDict is a pytorch dedicated tensor container.☆1,009Updated this week
- AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights (ICLR 2021)☆415Jan 13, 2021Updated 5 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- For optimization algorithm research and development.☆558Feb 23, 2026Updated last week
- Accessible large language models via k-bit quantization for PyTorch.☆7,997Updated this week
- Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion.☆2,478Feb 12, 2026Updated 2 weeks ago
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,401Feb 20, 2026Updated last week
- Type annotations and dynamic checking for a tensor's shape, dtype, names, etc.☆1,472May 2, 2025Updated 10 months ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆125Dec 29, 2025Updated 2 months ago
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,985Jun 16, 2024Updated last year
- Convolutions for Sequence Modeling☆912Jun 13, 2024Updated last year
- SAM: Sharpness-Aware Minimization (PyTorch)☆1,963Feb 21, 2024Updated 2 years ago
- MADGRAD Optimization Method☆801Jan 27, 2025Updated last year
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆413Feb 11, 2026Updated 2 weeks ago
- A playbook for systematically maximizing the performance of deep learning models.☆29,861Jun 18, 2024Updated last year
- Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models☆807Jun 8, 2025Updated 8 months ago
- PyTorch extensions for high performance and large scale training.☆3,400Apr 26, 2025Updated 10 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Apr 17, 2024Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.☆1,363Jun 13, 2024Updated last year
- Trains Transformer model variants. Data isn't shuffled between batches.☆143Oct 5, 2022Updated 3 years ago
- ☆316Jun 21, 2024Updated last year
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆26Jul 26, 2023Updated 2 years ago
- ☆22Nov 9, 2024Updated last year