This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 3 years ago
Alternatives and similar repositories for MoEBERT
Users that are interested in MoEBERT are comparing it to the libraries listed below
Sorting:
- ☆143Jul 21, 2024Updated last year
- [AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan☆14Oct 18, 2022Updated 3 years ago
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Mixture of Attention Heads☆51Oct 10, 2022Updated 3 years ago
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- ☆274Oct 31, 2023Updated 2 years ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆730Aug 29, 2022Updated 3 years ago
- ☆19Oct 31, 2022Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 18, 2026Updated 2 weeks ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Aug 10, 2024Updated last year
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 4 years ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- Code for COMET: Cardinality Constrained Mixture of Experts with Trees and Local Search☆11Jun 21, 2023Updated 2 years ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated 11 months ago
- ☆14Nov 23, 2023Updated 2 years ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆76Apr 10, 2023Updated 2 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,663Mar 8, 2024Updated last year
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆48May 25, 2022Updated 3 years ago
- A fast MoE impl for PyTorch☆1,840Feb 10, 2025Updated last year
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…☆27Nov 13, 2023Updated 2 years ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated last year
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated 2 weeks ago
- Investigating Cultural Alignment of Large Language Models☆13Aug 14, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,231Apr 19, 2024Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference☆162Mar 25, 2022Updated 3 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- ☆13Dec 17, 2021Updated 4 years ago
- The official implementation of the ACL 2023 paper, "Paraphrasing-Guided Data Augmentation for Contrastive Prompt-based Few-shot Fine-tuni…☆11Nov 28, 2023Updated 2 years ago
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learning☆14Oct 27, 2021Updated 4 years ago
- ☆13Feb 26, 2023Updated 3 years ago