yushuiwx / MH-MoELinks
☆15Updated 9 months ago
Alternatives and similar repositories for MH-MoE
Users that are interested in MH-MoE are comparing it to the libraries listed below
Sorting:
- Unsupervised GRPO☆41Updated 2 months ago
- Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"☆27Updated 2 years ago
- Mixture of Attention Heads☆48Updated 2 years ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆29Updated 3 weeks ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆86Updated 8 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆23Updated last month
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆17Updated 2 months ago
- A repository for DenseSSMs☆88Updated last year
- ☆17Updated last year
- ☆24Updated 3 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆38Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆60Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆86Updated 2 years ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20Updated 2 months ago
- ☆100Updated last year
- ☆91Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆18Updated 5 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆51Updated 2 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆19Updated last month
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆41Updated 7 months ago
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆45Updated 8 months ago
- [IJCAI'23] The official Github page of the paper "Diffusion Models for Non-autoregressive Text Generation: A Survey".☆31Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆42Updated last month
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 8 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆74Updated 8 months ago
- Codes for Merging Large Language Models☆33Updated last year
- ☆28Updated last year
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆72Updated this week
- ☆48Updated 11 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆70Updated last year