ZhenweiAn / Dynamic_MoE
Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"
☆22Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for Dynamic_MoE
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆31Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆100Updated 2 weeks ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆27Updated 3 months ago
- ☆88Updated last month
- ☆71Updated 10 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆38Updated 4 months ago
- ☆40Updated 5 months ago
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆88Updated last month
- [SIGIR'24] The official implementation code of MOELoRA.☆127Updated 4 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆125Updated 2 months ago
- ☆16Updated 2 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆73Updated 8 months ago
- ☆116Updated 4 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆147Updated 5 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆37Updated 2 weeks ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆24Updated 4 months ago
- The code and data for the paper JiuZhang3.0☆35Updated 5 months ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆29Updated 10 months ago
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆15Updated 6 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆36Updated 3 months ago
- ☆20Updated last month
- ☆13Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆70Updated 9 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆35Updated 7 months ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆29Updated 7 months ago
- ☆62Updated 3 weeks ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆34Updated last month
- ☆77Updated 4 months ago
- ☆89Updated 7 months ago