lxuechen / ml-swissknife
An ML research codebase built with friends :)
☆23Updated 7 months ago
Alternatives and similar repositories for ml-swissknife:
Users that are interested in ml-swissknife are comparing it to the libraries listed below
- SGD with large step sizes learns sparse features [ICML 2023]☆32Updated last year
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆44Updated 2 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆58Updated last year
- [ICML'21] Improved Contrastive Divergence Training of Energy Based Models☆62Updated 2 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- ☆52Updated 6 months ago
- ☆17Updated 2 years ago
- ☆16Updated last year
- Euclidean Wasserstein-2 optimal transportation☆46Updated last year
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- ☆28Updated last year
- ☆18Updated 2 years ago
- ☆27Updated last year
- ☆22Updated 3 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆36Updated 2 years ago
- ModelDiff: A Framework for Comparing Learning Algorithms☆56Updated last year
- Blog post☆17Updated last year
- Code for the ICLR 2020 Paper, "A Theory of Usable Information under Computational Constraints"☆26Updated 4 years ago
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- Code to implement the AND-mask and geometric mean to do gradient based optimization, from the paper "Learning explanations that are hard …☆39Updated 4 years ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Code for "SAM as an Optimal Relaxation of Bayes", ICLR 2023.☆25Updated last year
- ☆31Updated 4 years ago
- ☆22Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 6 months ago
- ☆65Updated 3 months ago
- ☆41Updated 2 years ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆58Updated 3 years ago
- Source code of "What can linearized neural networks actually say about generalization?☆20Updated 3 years ago