microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.12410).
☆126Updated last year
Related projects ⓘ
Alternatives and complementary repositories for AdaMix
- Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models☆133Updated 2 years ago
- The original Backpack Language Model implementation, a fork of FlashAttention☆64Updated last year
- ☆126Updated 2 years ago
- contrastive decoding☆181Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆127Updated 6 months ago
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆138Updated 2 months ago
- This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)☆97Updated last year
- Retrieval as Attention☆83Updated last year
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆92Updated last year
- ☆38Updated 7 months ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆97Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆106Updated 3 months ago
- ☆86Updated 5 months ago
- Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"☆161Updated 3 years ago
- [TMLR'23] Contrastive Search Is What You Need For Neural Text Generation☆118Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆136Updated last year
- Code for paper 'Data-Efficient FineTuning'☆29Updated last year
- ☆80Updated 2 years ago
- Progressive Prompts: Continual Learning for Language Models☆90Updated last year
- Building modular LMs with parameter-efficient fine-tuning.☆83Updated this week
- MEND: Fast Model Editing at Scale☆235Updated last year
- ☆108Updated 4 months ago
- DiffusER: Discrete Diffusion via Edit-based Reconstruction (Reid, Hellendoorn & Neubig, 2022)☆54Updated last year
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆156Updated last year
- An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi☆254Updated last year