NUS-HPC-AI-Lab / pytorch-lamb
PyTorch implementation of LAMB for ImageNet/ResNet-50 training
☆13Updated 3 years ago
Alternatives and similar repositories for pytorch-lamb:
Users that are interested in pytorch-lamb are comparing it to the libraries listed below
- Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…☆38Updated 3 years ago
- ☆201Updated 2 years ago
- ☆35Updated 3 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆73Updated 4 years ago
- ☆40Updated 3 years ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆114Updated 11 months ago
- ☆50Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆50Updated 10 months ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆58Updated 3 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆62Updated 3 years ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆35Updated last year
- Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)☆48Updated 3 years ago
- Block Sparse movement pruning☆78Updated 4 years ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆87Updated 8 months ago
- ☆41Updated 2 years ago
- Code for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH☆101Updated 5 years ago
- [ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…☆27Updated last year
- Pytorch library for factorized L0-based pruning.☆44Updated last year
- Efficient reference implementations of the static & dynamic M-FAC algorithms (for pruning and optimization)☆16Updated 3 years ago
- Code for Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot☆42Updated 4 years ago
- Soft Threshold Weight Reparameterization for Learnable Sparsity☆87Updated 2 years ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆35Updated 2 years ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated last year
- Lightweight torch implementation of rigl, a sparse-to-sparse optimizer.☆56Updated 3 years ago
- pytorch-profiler☆51Updated last year
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆146Updated 4 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- [ICML 2021] "Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training" by Shiwei Liu, Lu Yin, De…☆46Updated last year
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 9 months ago
- Training vision models with full-batch gradient descent and regularization☆37Updated 2 years ago