rollovd / LookSAM
This is unofficial repository for Towards Efficient and Scalable Sharpness-Aware Minimization.
☆36Updated last year
Alternatives and similar repositories for LookSAM:
Users that are interested in LookSAM are comparing it to the libraries listed below
- ☆18Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆30Updated 6 months ago
- ☆35Updated 2 years ago
- ☆34Updated last year
- ☆11Updated 2 years ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Updated last year
- Code for the paper "Efficient Dataset Distillation using Random Feature Approximation"☆37Updated 2 years ago
- ☆58Updated 2 years ago
- ☆17Updated 11 months ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆44Updated last year
- Sharpness-Aware Minimization Leads to Low-Rank Features [NeurIPS 2023]☆28Updated last year
- gradient norm penalty☆39Updated 10 months ago
- Git Re-Basin: Merging Models modulo Permutation Symmetries in PyTorch☆75Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 7 months ago
- Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation (ICML'24 Oral)☆14Updated 9 months ago
- Prospect Pruning: Finding Trainable Weights at Initialization Using Meta-Gradients☆31Updated 3 years ago
- Deep Learning & Information Bottleneck☆60Updated last year
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated last year
- Simple CIFAR10 ResNet example with JAX.☆23Updated 3 years ago
- PyTorch repository for ICLR 2022 paper (GSAM) which improves generalization (e.g. +3.8% top-1 accuracy on ImageNet with ViT-B/32)☆143Updated 2 years ago
- ☆28Updated last month
- Weight-Averaged Sharpness-Aware Minimization (NeurIPS 2022)☆28Updated 2 years ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆35Updated 2 years ago
- This is the official implementation of the ICML 2023 paper - Can Forward Gradient Match Backpropagation ?☆12Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆57Updated last month
- Source code of "What can linearized neural networks actually say about generalization?☆20Updated 3 years ago
- This repository is the official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021)(http://…☆24Updated 2 years ago
- Training vision models with full-batch gradient descent and regularization☆37Updated 2 years ago
- Compression schema for gradients of activations in backward pass☆44Updated last year
- ☆12Updated 3 months ago