tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆49Updated last month
Related projects ⓘ
Alternatives and complementary repositories for why-weight-decay
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆40Updated 6 months ago
- Blog post☆16Updated 8 months ago
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 4 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated 11 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆48Updated 2 months ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆19Updated this week
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated this week
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆26Updated 3 years ago
- ☆15Updated last year
- ☆50Updated 4 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- ☆46Updated last month
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆37Updated last year
- ☆50Updated last week
- ☆29Updated last month
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆42Updated 6 months ago
- ☆45Updated 9 months ago
- ☆21Updated last year
- ☆24Updated 8 months ago
- ☆15Updated 4 months ago
- ☆50Updated 5 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 6 months ago
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago
- ☆13Updated last year
- Code for T-MARS data filtering☆35Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆89Updated 2 months ago
- ☆16Updated 4 months ago
- ☆21Updated last month