☆19Oct 31, 2022Updated 3 years ago
Alternatives and similar repositories for EvoMoE
Users that are interested in EvoMoE are comparing it to the libraries listed below
Sorting:
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated last month
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 28, 2026Updated 2 weeks ago
- ☆19Sep 15, 2022Updated 3 years ago
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆52May 31, 2023Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆124Dec 18, 2023Updated 2 years ago
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- Pytorch implementation of the paper "Debiasing the Cloze Task in Sequential Recommendation with Bidirectional Transformers".☆12Jan 22, 2023Updated 3 years ago
- ☆36Nov 13, 2020Updated 5 years ago
- ☆17Dec 9, 2022Updated 3 years ago
- [MIDL 2023] Official Imeplementation of "Making Your First Choice: To Address Cold Start Problem in Vision Active Learning"☆36Aug 3, 2023Updated 2 years ago
- Compression for Foundation Models☆35Jul 21, 2025Updated 7 months ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training.☆336Dec 13, 2025Updated 3 months ago
- ☆18May 30, 2023Updated 2 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆11Feb 28, 2026Updated 2 weeks ago
- ☆16Mar 5, 2024Updated 2 years ago
- Spatial Mixture-of-Experts☆21Nov 29, 2022Updated 3 years ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆39May 28, 2024Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- Examples for MS-AMP package.☆30Jul 17, 2025Updated 8 months ago
- ☆12Sep 29, 2019Updated 6 years ago
- ☆11Nov 14, 2021Updated 4 years ago
- Accommodating Large Language Model Training over Heterogeneous Environment.☆25Mar 13, 2025Updated last year
- yolo with rotated bounding boxes☆15Sep 17, 2018Updated 7 years ago
- RankFormer: Listwise Learning-to-Rank Using Listwide Labels (KDD 2023).☆27Sep 12, 2023Updated 2 years ago
- Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution☆26Mar 18, 2021Updated 5 years ago
- ☆13Nov 19, 2020Updated 5 years ago
- TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation☆17Feb 20, 2025Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 4 months ago
- Code and Model for NeurIPS 2024 Spotlight Paper "Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training…☆44Oct 16, 2024Updated last year
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆59Jan 14, 2022Updated 4 years ago
- [ACL'25 Main] Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆40May 26, 2025Updated 9 months ago
- AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL 2024)☆51Mar 17, 2024Updated 2 years ago
- ☆29Apr 22, 2024Updated last year
- ☆12Oct 17, 2022Updated 3 years ago