This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 3 years ago
Alternatives and similar repositories for MoEBERT
Users that are interested in MoEBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆145Jul 21, 2024Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆980Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- ☆275Oct 31, 2023Updated 2 years ago
- ☆715Dec 6, 2025Updated 4 months ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆45Feb 28, 2026Updated last month
- The official implementation of the ACL 2023 paper, "Paraphrasing-Guided Data Augmentation for Contrastive Prompt-based Few-shot Fine-tuni…☆11Nov 28, 2023Updated 2 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,847Feb 10, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆53Jan 19, 2023Updated 3 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- [NAACL 2022] "Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training", Yuanxin Liu, Fandong Meng, Zheng Lin, Pe…☆15Oct 18, 2022Updated 3 years ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆91Oct 15, 2024Updated last year
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆48May 25, 2022Updated 3 years ago
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated 2 months ago
- ☆95Jul 26, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- Code for "Simulated Multiple Reference Training Improves Low-Resource Machine Translation"☆15Dec 1, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Aug 10, 2024Updated last year
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- ☆15Nov 23, 2023Updated 2 years ago
- code repo for EMNLP'21 Finding Counter-Interference Adapter for Multilingual Machine Translation☆18Oct 19, 2022Updated 3 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆98Aug 18, 2023Updated 2 years ago
- ☆13Nov 10, 2021Updated 4 years ago
- Code for ACL 2022 paper "BERT Learns to Teach: Knowledge Distillation with Meta Learning".☆86Aug 4, 2022Updated 3 years ago
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198May 9, 2023Updated 2 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆181Jul 12, 2024Updated last year
- ☆30Sep 28, 2023Updated 2 years ago
- Deep learning images developed from nvidia/cuda-cudnn-devel-ubuntu.☆23Aug 24, 2022Updated 3 years ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…☆27Nov 13, 2023Updated 2 years ago