NUS-HPC-AI-Lab / pytorch-lamb
PyTorch implementation of LAMB for ImageNet/ResNet-50 training
☆14Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for pytorch-lamb
- Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…☆38Updated 3 years ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆56Updated 2 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- ☆47Updated last year
- [ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…☆27Updated last year
- Parameter Efficient Transfer Learning with Diff Pruning☆72Updated 3 years ago
- Open source code for paper "On the Learning and Learnability of Quasimetrics".☆32Updated last year
- [Neurips 2022] “ Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropogation”, Ziyu Jiang*, Xuxi Chen*, Xueqin Huan…☆19Updated last year
- Block Sparse movement pruning☆78Updated 3 years ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- Code for Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot☆42Updated 4 years ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated last year
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆53Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆43Updated 6 months ago
- [ATTRIB @ NeurIPS 2024 Oral] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)☆46Updated 3 years ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- ☆39Updated 3 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆61Updated 3 years ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆30Updated 8 months ago
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆26Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆110Updated 8 months ago
- A small repository demonstrating the use of Webdataset and Imagenet☆14Updated 11 months ago
- Stick-breaking attention☆34Updated 2 weeks ago
- Codebase used in the paper "Foundational Models for Continual Learning: An Empirical Study of Latent Replay".☆30Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆79Updated 5 months ago
- [ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang☆25Updated 2 years ago
- Host CIFAR-10.2 Data Set☆13Updated 3 years ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆16Updated 5 months ago