sail-sg / winLinks
☆9Updated 2 years ago
Alternatives and similar repositories for win
Users that are interested in win are comparing it to the libraries listed below
Sorting:
- ☆13Updated 6 months ago
- ☆33Updated 4 months ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Updated last year
- ☆26Updated 2 weeks ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- ☆19Updated last year
- ☆36Updated 2 years ago
- supporting pytorch FSDP for optimizers☆82Updated 7 months ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆59Updated 4 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆116Updated last week
- ☆197Updated 7 months ago
- ☆11Updated 4 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆89Updated last year
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆82Updated 11 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆54Updated last year
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated 2 years ago
- Griffin MQA + Hawk Linear RNN Hybrid☆87Updated last year
- ☆30Updated 5 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆75Updated last year
- ☆53Updated 9 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- Git Re-Basin: Merging Models modulo Permutation Symmetries in PyTorch☆76Updated 2 years ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆32Updated 8 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- Implementation of Infini-Transformer in Pytorch☆111Updated 6 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆110Updated 7 months ago
- [ICLR 2025] AdaFisher: Adaptive Second Order Optimization via Fisher Information☆40Updated 5 months ago
- ☆105Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆68Updated 11 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago