MarlonBecker / MSAMLinks

☆20

Alternatives and similar repositories for MSAM

Users that are interested in MSAM are comparing it to the libraries listed below

Sorting:

mueller-mp / SAM-ON
☆34Updated last year
lzhangbv / eva
[ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation
☆12Updated 2 years ago
locuslab / edge-of-stability
☆73Updated last year
LIONS-EPFL / scion
☆49Updated last month
andyjm3 / SLTrain
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆38Updated last year
AngusDujw / SAF
☆35Updated 3 years ago
tml-epfl / sharpness-vs-generalization
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆43Updated 2 years ago
nblt / RWP
☆11Updated 3 years ago
IlanPrice / DCTpS
Code for testing DCT plus Sparse (DCTpS) networks
☆14Updated 4 years ago
hushon / JAX-ResNet-CIFAR10
Simple CIFAR10 ResNet example with JAX.
☆23Updated 4 years ago
tml-epfl / understanding-sam
Towards Understanding Sharpness-Aware Minimization [ICML 2022]
☆36Updated 3 years ago
gortizji / linearized-networks
Source code of "What can linearized neural networks actually say about generalization?
☆20Updated 4 years ago
fKunstner / noise-sgd-adam-sign
☆16Updated 2 years ago
chengxiang / LinearTransformer
Pytorch code for experiments on Linear Transformers
☆24Updated last year
cjyaras / deep-lora-transformers
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation (ICML'24 Oral)
☆13Updated last year
zyushun / hessian-spectrum
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
☆63Updated 8 months ago
reds-lab / LAVA
This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).
☆51Updated last year
MaximeRobeyns / bayesian_lora
Bayesian Low-Rank Adaptation for Large Language Models
☆36Updated last year
formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆63Updated 2 years ago
yolky / RFAD
Code for the paper "Efficient Dataset Distillation using Random Feature Approximation"
☆37Updated 2 years ago
themrzmaster / git-re-basin-pytorch
Git Re-Basin: Merging Models modulo Permutation Symmetries in PyTorch
☆78Updated 2 years ago
tml-epfl / sam-low-rank-features
Sharpness-Aware Minimization Leads to Low-Rank Features [NeurIPS 2023]
☆28Updated 2 years ago
dydjw9 / Efficient_SAM
☆58Updated 2 years ago
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆107Updated 2 years ago
zhaoyang-0204 / gnp
gradient norm penalty
☆41Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Updated 2 years ago
epfml / REQ
☆18Updated last year
JonasGeiping / fullbatchtraining
Training vision models with full-batch gradient descent and regularization
☆39Updated 2 years ago
nitarshan / robust-generalization-measures
Official code for "In Search of Robust Measures of Generalization" (NeurIPS 2020)
☆28Updated 4 years ago
r-three / mats
☆32Updated last year