yushuiwx / MH-MoELinks
☆15Updated 7 months ago
Alternatives and similar repositories for MH-MoE
Users that are interested in MH-MoE are comparing it to the libraries listed below
Sorting:
- ☆15Updated last month
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆38Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆30Updated 2 years ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆26Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 7 months ago
- Mixture of Attention Heads☆44Updated 2 years ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆46Updated 6 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20Updated last week
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆35Updated last year
- Doodling our way to AGI ✏️ 🖼️ 🧠☆50Updated last week
- Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"☆26Updated 2 years ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆21Updated last week
- Codes for Merging Large Language Models☆31Updated 9 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Updated last year
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆19Updated 3 months ago
- ☆83Updated last month
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- ☆99Updated last year
- A repository for DenseSSMs☆87Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆38Updated last year
- ☆17Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆57Updated last year
- Official implementation for FlexAttention for Efficient High-Resolution Vision-Language Models☆40Updated 4 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆49Updated last year
- ICLR 2025☆26Updated 2 weeks ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆86Updated 6 months ago
- ☆29Updated last year
- Preference Learning for LLaVA☆45Updated 6 months ago