JeanKaddour / LAWALinks

Latest Weight Averaging (NeurIPS HITY 2022)

☆31

Alternatives and similar repositories for LAWA

Users that are interested in LAWA are comparing it to the libraries listed below

Sorting:

JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Updated 2 years ago
facebookresearch / ModelRatatouille
Recycling diverse models
☆46Updated 2 years ago
gregorbachmann / scaling_mlps
☆52Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆68Updated last year
MadryLab / datamodels-data
Data for "Datamodels: Predicting Predictions with Training Data"
☆97Updated 2 years ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆45Updated last month
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
JonasGeiping / dataaugs
☆18Updated 3 years ago
KellerJordan / REPAIR
Code release for REPAIR: REnormalizing Permuted Activations for Interpolation Repair
☆51Updated last year
stanislavfort / dissect-git-re-basin
Replicating and dissecting the git-re-basin project in one-click-replication Colabs
☆35Updated 3 years ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆29Updated last month
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated 5 months ago
tml-epfl / sharpness-vs-generalization
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆43Updated 2 years ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆46Updated last year
LIONS-EPFL / scion
☆47Updated last month
facebookresearch / ViP-MAE
This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision
☆36Updated 2 years ago
oripress / EntropyEnigma
Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"
☆55Updated last year
MadryLab / modeldiff
ModelDiff: A Framework for Comparing Learning Algorithms
☆58Updated 2 years ago
aks2203 / deep-thinking
A centralized place for deep thinking code and experiments
☆87Updated 2 years ago
dydjw9 / Efficient_SAM
☆58Updated 2 years ago
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆88Updated 2 years ago
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆31Updated last year
mlfoundations / patching
Patching open-vocabulary models by interpolating weights
☆91Updated 2 years ago
shoaibahmed / metadata_archaeology
Official code for the paper: "Metadata Archaeology"
☆19Updated 2 years ago
Ping-C / optimizer
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆40Updated 2 years ago
Qualcomm-AI-research / llm-surgeon
☆34Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago