Fsoft-AIC / LibMoELinks
LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS
☆39Updated last month
Alternatives and similar repositories for LibMoE
Users that are interested in LibMoE are comparing it to the libraries listed below
Sorting:
- Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation☆35Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 7 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 7 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆74Updated 6 months ago
- ☆179Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆52Updated 4 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆55Updated 9 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated last month
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆59Updated 3 months ago
- Are gradient information useful for pruning of LLMs?☆45Updated last year
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆64Updated 3 months ago
- [ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.☆23Updated last month
- ☆105Updated 2 months ago
- ☆81Updated 2 months ago
- [Preprint 2025] Thinkless: LLM Learns When to Think☆125Updated this week
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆20Updated 3 months ago
- [NAACL 2025] Towards Rationality in Language and Multimodal Agents: A Survey☆28Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆38Updated 11 months ago
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆24Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆56Updated 5 months ago
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning☆23Updated 2 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆37Updated last year
- ☆42Updated 6 months ago
- Pioneering in Vietnamese Multimodal Large Language Model☆47Updated 4 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- Awesome Low-Rank Adaptation☆39Updated 9 months ago
- ☆27Updated last year
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆77Updated last week
- ☆16Updated 4 months ago