NUS-HPC-AI-Lab / pytorch-lambLinks

PyTorch implementation of LAMB for ImageNet/ResNet-50 training

☆13

Alternatives and similar repositories for pytorch-lamb

Users that are interested in pytorch-lamb are comparing it to the libraries listed below

Sorting:

NUS-HPC-AI-Lab / LARS-ImageNet-PyTorch
Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…
☆38Updated 4 years ago
HazyResearch / fly
☆221Updated 2 years ago
juntang-zhuang / GSAM
PyTorch repository for ICLR 2022 paper (GSAM) which improves generalization (e.g. +3.8% top-1 accuracy on ImageNet with ViT-B/32)
☆144Updated 3 years ago
verbiiyo / rigl-torch
Lightweight torch implementation of rigl, a sparse-to-sparse optimizer.
☆60Updated 4 years ago
zyushun / hessian-spectrum
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
☆63Updated 8 months ago
varunnair18 / FISH
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).
☆59Updated 3 years ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Updated 2 years ago
pipilurj / BONAS
☆35Updated 4 years ago
epfml / powersgd
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
☆149Updated last year
LiuXiaoxuanPKU / GACT-ICML
☆43Updated 3 years ago
VITA-Group / SMC-Bench
[ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…
☆28Updated 2 years ago
IST-DASLab / WoodFisher
Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)
☆53Updated 4 years ago
lzhangbv / eva
[ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation
☆12Updated 2 years ago
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
libffcv / ffcv-imagenet
Train ImageNet *fast* in 500 lines of code with FFCV
☆149Updated last year
gpauloski / kfac-pytorch
Distributed K-FAC preconditioner for PyTorch
☆91Updated this week
hushon / JAX-ResNet-CIFAR10
Simple CIFAR10 ResNet example with JAX.
☆23Updated 4 years ago
JonasGeiping / fullbatchtraining
Training vision models with full-batch gradient descent and regularization
☆39Updated 2 years ago
DS3Lab / AC-SGD
Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.
☆28Updated 2 years ago
DS3Lab / DT-FM
☆94Updated 3 years ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
jianweif / OptimalGradCheckpointing
☆41Updated 4 years ago
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆71Updated last year
yxli2123 / LoSparse
☆62Updated 2 years ago
ucbrise / actnn
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
☆199Updated 2 years ago
themrzmaster / git-re-basin-pytorch
Git Re-Basin: Merging Models modulo Permutation Symmetries in PyTorch
☆78Updated 2 years ago
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 7 months ago
kashif / ICLR2022-OpenReviewData
Crawl & visualize ICLR papers and reviews.
☆107Updated 3 years ago
alecwangcq / GraSP
Code for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH
☆105Updated 5 years ago
yuezhouhu / 2by4-pretrain
Efficient 2:4 sparse training algorithms and implementations
☆57Updated 11 months ago