Fsoft-AIC / LibMoELinks

LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS

☆43

Alternatives and similar repositories for LibMoE

Users that are interested in LibMoE are comparing it to the libraries listed below

Sorting:

pphuc25 / distil-cd
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
☆35Updated last year
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆135Updated 3 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆108Updated this week
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated last year
Yuheng2000 / Awesome-LoRA
Awesome Low-Rank Adaptation
☆48Updated 2 months ago
DavidFanzz / SCMoE
☆28Updated last year
arpita8 / Awesome-Mixture-of-Experts-Papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆135Updated last year
Dereck0602 / Awesome_Test_Time_LLMs
☆129Updated 7 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆82Updated 11 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆128Updated 3 months ago
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆92Updated 11 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆46Updated last year
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Updated 3 months ago
TUDB-Labs / MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
☆191Updated last year
iboing / CorDA
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)
☆52Updated 9 months ago
harveyhuang18 / EMR_Merging
[NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging
☆69Updated 7 months ago
zefang-liu / AdaMoLE
AdaMoLE: Adaptive Mixture of LoRA Experts
☆37Updated last year
TUDB-Labs / MoE-PEFT
An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT
☆124Updated 7 months ago
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆177Updated 6 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
prateeky2806 / ties-merging
☆194Updated last year
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆40Updated last year
stanfordmlgroup / ManyICL
☆142Updated last year
mrflogs / LoRA-Pro
Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "
☆132Updated 6 months ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆82Updated last year
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆44Updated last year
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆158Updated 3 weeks ago
r-three / smear
☆30Updated 2 years ago
yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆37Updated 4 months ago
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆95Updated 3 months ago