This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 4 years ago
Alternatives and similar repositories for MoEBERT
Users that are interested in MoEBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆145Jul 21, 2024Updated last year
- [AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan☆14Oct 18, 2022Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆991May 13, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- ☆276Oct 31, 2023Updated 2 years ago
- ☆717Dec 6, 2025Updated 5 months ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆46Feb 28, 2026Updated 2 months ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆728Aug 29, 2022Updated 3 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,850Feb 10, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41May 31, 2021Updated 4 years ago
- ☆54Jan 19, 2023Updated 3 years ago
- [NAACL 2022] "Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training", Yuanxin Liu, Fandong Meng, Zheng Lin, Pe…☆15Oct 18, 2022Updated 3 years ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆92Oct 15, 2024Updated last year
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆48May 25, 2022Updated 4 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,679Mar 8, 2024Updated 2 years ago
- ☆95Jul 26, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆45Aug 10, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,243Apr 19, 2024Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- ☆15Nov 23, 2023Updated 2 years ago
- code repo for EMNLP'21 Finding Counter-Interference Adapter for Multilingual Machine Translation☆18Oct 19, 2022Updated 3 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆98Aug 18, 2023Updated 2 years ago
- ☆13Nov 10, 2021Updated 4 years ago
- Code for COMET: Cardinality Constrained Mixture of Experts with Trees and Local Search☆11Jun 21, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code for ACL 2022 paper "BERT Learns to Teach: Knowledge Distillation with Meta Learning".☆86Aug 4, 2022Updated 3 years ago
- Synthetic data generation for TODs☆23Jul 17, 2024Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 5 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198May 9, 2023Updated 3 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆183Jul 12, 2024Updated last year