JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆28Updated last year
Alternatives and similar repositories for LAWA:
Users that are interested in LAWA are comparing it to the libraries listed below
- Recycling diverse models☆44Updated 2 years ago
- Code for T-MARS data filtering☆35Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- ☆30Updated 2 months ago
- ☆17Updated 2 years ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆29Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- ☆18Updated 8 months ago
- Implementation of Bitune: Bidirectional Instruction-Tuning☆19Updated 9 months ago
- Official code for the paper: "Metadata Archaeology"☆19Updated last year
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆12Updated this week
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 9 months ago
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 5 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- ☆52Updated 5 months ago
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆49Updated 9 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated last year
- ☆28Updated 8 months ago
- ☆33Updated 6 months ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated 11 months ago
- ☆28Updated last year
- ☆51Updated 9 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last week