Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"
☆67Jul 30, 2024Updated last year
Alternatives and similar repositories for Dynamic_MoE
Users that are interested in Dynamic_MoE are comparing it to the libraries listed below
Sorting:
- ☆15Oct 19, 2024Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆155Jul 9, 2025Updated 7 months ago
- This is code for the EMNLP 2022 Paper "UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation".☆10Apr 30, 2023Updated 2 years ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- ☆129Jun 6, 2025Updated 8 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- Efficient Mixture of Experts for LLM Paper List☆167Sep 28, 2025Updated 5 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Feb 7, 2025Updated last year
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆106Dec 20, 2024Updated last year
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated 11 months ago
- ☆17Jun 11, 2025Updated 8 months ago
- ☆14Aug 8, 2022Updated 3 years ago
- ☆19Nov 5, 2024Updated last year
- 🦦 Source Code for EMNLP-22 findings paper "Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in…☆21May 10, 2023Updated 2 years ago
- [ACL2025 Best Paper] Language Models Resist Alignment☆43Jun 11, 2025Updated 8 months ago
- [ICML 2025 Oral] Mixture of Lookup Experts☆72Dec 3, 2025Updated 2 months ago
- spatio-temporal tasks☆16Jul 15, 2024Updated last year
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆27Nov 18, 2022Updated 3 years ago
- ☆33Oct 4, 2024Updated last year
- [AAAI2025] Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration☆33Feb 11, 2025Updated last year
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆31Sep 1, 2024Updated last year
- Dataset published in paper "FinRED: A Dataset for Relation Extraction in Financial Domain"☆30Apr 15, 2022Updated 3 years ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated 9 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Nov 4, 2025Updated 3 months ago
- [ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts☆265Oct 16, 2024Updated last year
- This repository contains an implementation of the 3D watermarking algorithm proposed by Cayre et al based on Spectral Decomposition.☆11Jun 3, 2018Updated 7 years ago
- 格物-多语言和中文大规模预训练模型-轻量版,涵盖纯中文、知识增强、113个语种多语言,采用主流Roberta架构,适用于NLU和NLG任务, 支持pytorch、tensorflow、uer、huggingface等框架。 Multilingual and Chinese …☆30Nov 17, 2022Updated 3 years ago
- ☆32Jul 31, 2023Updated 2 years ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆482Jul 23, 2025Updated 7 months ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,001Dec 6, 2024Updated last year
- ☆11Jul 7, 2020Updated 5 years ago
- Support for training SSD on TF2☆12Mar 29, 2023Updated 2 years ago
- This package is essentially a ros-wrapper of neural_cam. More features would be added in the future, geared towards mobile robot platform…☆11Jul 12, 2019Updated 6 years ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- indoor navigation attempts with RSSI fingerprinting and trilateration☆12Nov 28, 2018Updated 7 years ago
- Use MobileNet SSD and openCV to detect and count car on road☆12Jan 13, 2020Updated 6 years ago
- A CNN-BiLSTM model for Li-ion battery state of health and remaining useful life prediction☆11Mar 25, 2024Updated last year
- [EMNLP 2022 Findings] Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study☆34Feb 23, 2024Updated 2 years ago
- ☆91Aug 18, 2024Updated last year