wjxts / RegularizedBN
☆21Updated 2 years ago
Alternatives and similar repositories for RegularizedBN:
Users that are interested in RegularizedBN are comparing it to the libraries listed below
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- Code for T-MARS data filtering☆35Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- ☆18Updated 8 months ago
- Mixture of Attention Heads☆42Updated 2 years ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆14Updated 8 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- Implementation of Beyond Neural Scaling beating power laws for deep models and prototype-based models☆33Updated 3 months ago
- [ICLR 2021] "Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning" by Tianlong Chen*, Zhenyu Zhang*, Sijia Liu, S…☆25Updated 3 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 3 weeks ago
- ☆15Updated last year
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆55Updated last year
- Structured Pruning Adapters in PyTorch☆16Updated last year
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 9 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆18Updated 5 months ago
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆29Updated 2 years ago
- ☆28Updated 8 months ago
- ☆17Updated 2 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- Parameter Efficient Transfer Learning with Diff Pruning☆73Updated 4 years ago
- ☆34Updated 8 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆27Updated 11 months ago
- ☆57Updated 2 years ago