Arongil / lipschitz-transformersLinks
Don't just regulate gradients like in Muon, regulate the weights too
β31Updated 4 months ago
Alternatives and similar repositories for lipschitz-transformers
Users that are interested in lipschitz-transformers are comparing it to the libraries listed below
Sorting:
- Supporting code for the blog post on modular manifolds.β105Updated 2 months ago
- πSmall Batch Size Training for Language Modelsβ68Updated 2 months ago
- β53Updated last year
- β62Updated last year
- WIPβ93Updated last year
- Flash Attention Triton kernel with support for second-order derivativesβ121Updated this week
- β27Updated 2 months ago
- Official code for the paper "Attention as a Hypernetwork"β46Updated last year
- Focused on fast experimentation and simplicityβ76Updated 11 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token predictionβ80Updated 6 months ago
- β34Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- supporting pytorch FSDP for optimizersβ84Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNSβ28Updated 2 months ago
- β32Updated last year
- β52Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ134Updated last month
- β34Updated last year
- β122Updated 6 months ago
- β42Updated last month
- Official PyTorch Implementation of the Longhorn Deep State Space Modelβ56Updated last year
- β35Updated last year
- β30Updated last year
- Deep Networks Grok All the Time and Here is Whyβ38Updated last year
- β68Updated last year
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adamβ84Updated last year
- β33Updated 11 months ago
- Code for the paper "Function-Space Learning Rates"β23Updated 6 months ago
- β13Updated 9 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunβ56Updated 9 months ago