yushuiwx / MH-MoELinks

☆15

Alternatives and similar repositories for MH-MoE

Users that are interested in MH-MoE are comparing it to the libraries listed below

Sorting:

waltonfuture / MM-UPT
Unsupervised GRPO
☆41Updated 2 months ago
amazon-science / peft-design-spaces
Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"
☆27Updated 2 years ago
yikangshen / MoA
Mixture of Attention Heads
☆48Updated 2 years ago
QizhiPei / MathFusion
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆29Updated 3 weeks ago
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆86Updated 8 months ago
Kamichanw / CoS
[ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"
☆23Updated last month
Hongcheng-Gao / HAVEN
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆17Updated 2 months ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆88Updated last year
core-mm / core-mm
☆17Updated last year
aeroplanepaper / GRPO-LEAD
☆24Updated 3 months ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆60Updated last year
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆86Updated 2 years ago
Richar-Du / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆20Updated 2 months ago
vlf-silkie / VLFeedback
☆100Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated last year
EIT-NLP / Distilling-CoT-Reasoning
[ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".
☆18Updated 5 months ago
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆51Updated 2 months ago
tianyi-lab / Mosaic-IT
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆19Updated last month
UMass-Embodied-AGI / FlexAttention
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆41Updated 7 months ago
Leezekun / MMSci
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
☆45Updated 8 months ago
AoiDragon / Awesome-Text-Diffusion-Models
[IJCAI'23] The official Github page of the paper "Diffusion Models for Non-autoregressive Text Generation: A Survey".
☆31Updated last year
gyhdog99 / MoCLE
MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)
☆42Updated last month
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆62Updated 8 months ago
xiangyu-mm / EasyGen
The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"
☆74Updated 8 months ago
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆33Updated last year
Pbihao / SLM
☆28Updated last year
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆72Updated this week
CMMMU-Benchmark / CMMMU
☆48Updated 11 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year