amazon-science / adaptive-feature-transferLinks
Official implementation of Adaptive Feature Transfer (AFT)
☆23Updated last year
Alternatives and similar repositories for adaptive-feature-transfer
Users that are interested in adaptive-feature-transfer are comparing it to the libraries listed below
Sorting:
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated 2 years ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Updated 2 years ago
- An official PyTorch implementation for CLIPPR☆30Updated 2 years ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆47Updated last year
- [NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization☆39Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆70Updated last year
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆55Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated 2 years ago
- ☆48Updated last year
- Recycling diverse models☆46Updated 3 years ago
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆24Updated 4 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆32Updated 2 years ago
- Code for Principal Masked Autoencoders☆30Updated last week
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆32Updated 9 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆41Updated last month
- Implementation of Infini-Transformer in Pytorch☆112Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆47Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆29Updated last year
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Updated 2 years ago
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆40Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Updated 5 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆58Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated 2 years ago