This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 4 years ago
Alternatives and similar repositories for MoEBERT
Users that are interested in MoEBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆145Jul 21, 2024Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- [AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan☆14Oct 18, 2022Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆992Jun 4, 2026Updated last week
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 3 years ago
- ☆276Oct 31, 2023Updated 2 years ago
- ☆724Jun 6, 2026Updated last week
- ☆21Oct 31, 2022Updated 3 years ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆46Feb 28, 2026Updated 3 months ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆727Aug 29, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated 2 years ago
- ☆54Jan 19, 2023Updated 3 years ago
- [NAACL 2022] "Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training", Yuanxin Liu, Fandong Meng, Zheng Lin, Pe…☆15Oct 18, 2022Updated 3 years ago
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆48May 25, 2022Updated 4 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,684Mar 8, 2024Updated 2 years ago
- ☆95Jul 26, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆45Aug 10, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- ☆15Nov 23, 2023Updated 2 years ago
- code repo for EMNLP'21 Finding Counter-Interference Adapter for Multilingual Machine Translation☆18Oct 19, 2022Updated 3 years ago
- ☆13Nov 10, 2021Updated 4 years ago
- Code for COMET: Cardinality Constrained Mixture of Experts with Trees and Local Search☆11Jun 21, 2023Updated 2 years ago
- Take neural networks as APIs for human-like AI.☆20Dec 4, 2019Updated 6 years ago
- Synthetic data generation for TODs☆23Jul 17, 2024Updated last year
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 5 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198May 9, 2023Updated 3 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆182Jul 12, 2024Updated last year
- ☆30Sep 28, 2023Updated 2 years ago
- Deep learning images developed from nvidia/cuda-cudnn-devel-ubuntu.☆23Aug 24, 2022Updated 3 years ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆140Jun 12, 2024Updated 2 years ago
- Code of Robust Lottery Tickets for Pre-trained Language Models (ACL2022)☆20Jul 18, 2022Updated 3 years ago
- Investigating Cultural Alignment of Large Language Models☆13Aug 14, 2024Updated last year