sail-sg / winLinks
☆9Updated 2 years ago
Alternatives and similar repositories for win
Users that are interested in win are comparing it to the libraries listed below
Sorting:
- ☆13Updated 6 months ago
- ☆34Updated 4 months ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Updated 2 years ago
- ☆33Updated 3 weeks ago
- ☆19Updated last year
- ☆36Updated 2 years ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆60Updated 4 months ago
- supporting pytorch FSDP for optimizers☆84Updated 8 months ago
- Unofficial Implementation of Selective Attention Transformer☆17Updated 9 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 10 months ago
- ☆11Updated 5 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆118Updated last month
- ☆21Updated 2 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆84Updated last year
- ☆70Updated 6 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- ☆206Updated 8 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆88Updated last year
- Implementation of Infini-Transformer in Pytorch☆110Updated 7 months ago
- ☆53Updated 10 months ago
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆19Updated 3 weeks ago
- ☆31Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆76Updated last year
- [CVPR 2024] Friendly Sharpness-Aware Minimization☆34Updated 9 months ago
- ☆106Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆32Updated 9 months ago
- Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin,Yefen…☆55Updated 2 weeks ago
- Implementations of various linear RNN layers using pytorch and triton☆53Updated 2 years ago
- Experiments on Multi-Head Latent Attention☆94Updated 11 months ago