This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 4 years ago
Alternatives and similar repositories for MoEBERT
Users that are interested in MoEBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆145Jul 21, 2024Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- [AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan☆14Oct 18, 2022Updated 3 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- ☆277Oct 31, 2023Updated 2 years ago
- ☆717Dec 6, 2025Updated 4 months ago
- ☆20Oct 31, 2022Updated 3 years ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- The official implementation of the ACL 2023 paper, "Paraphrasing-Guided Data Augmentation for Contrastive Prompt-based Few-shot Fine-tuni…☆11Nov 28, 2023Updated 2 years ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆729Aug 29, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- ☆53Jan 19, 2023Updated 3 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- [NAACL 2022] "Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training", Yuanxin Liu, Fandong Meng, Zheng Lin, Pe…☆15Oct 18, 2022Updated 3 years ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆92Oct 15, 2024Updated last year
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆48May 25, 2022Updated 3 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,678Mar 8, 2024Updated 2 years ago
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated 2 months ago
- ☆95Jul 26, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Paper Today I Read☆29Updated this week
- ☆13Jun 16, 2021Updated 4 years ago
- Code for "Simulated Multiple Reference Training Improves Low-Resource Machine Translation"☆15Dec 1, 2020Updated 5 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Aug 10, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,242Apr 19, 2024Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- code repo for EMNLP'21 Finding Counter-Interference Adapter for Multilingual Machine Translation☆18Oct 19, 2022Updated 3 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆98Aug 18, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Code for COMET: Cardinality Constrained Mixture of Experts with Trees and Local Search☆11Jun 21, 2023Updated 2 years ago
- Code for ACL 2022 paper "BERT Learns to Teach: Knowledge Distillation with Meta Learning".☆86Aug 4, 2022Updated 3 years ago
- Synthetic data generation for TODs☆23Jul 17, 2024Updated last year
- Take neural networks as APIs for human-like AI.☆20Dec 4, 2019Updated 6 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 4 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆183Jul 12, 2024Updated last year