[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆79Dec 27, 2025Updated 2 months ago
Alternatives and similar repositories for MoME
Users that are interested in MoME are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆71Oct 17, 2025Updated 4 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated 2 weeks ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Jul 24, 2025Updated 7 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆55Oct 10, 2024Updated last year
- ☆17Aug 7, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning☆27Mar 23, 2025Updated 11 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆23Jun 27, 2025Updated 8 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆162Jun 8, 2024Updated last year
- A large scale inpainting & t2i anime image dataset☆15Oct 18, 2025Updated 4 months ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆92Jan 3, 2026Updated 2 months ago
- ☆47Nov 8, 2024Updated last year
- Open-Vocabulary Panoptic Segmentation☆27Jun 15, 2025Updated 8 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆123Jul 1, 2024Updated last year
- Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model☆13Dec 29, 2024Updated last year
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆12Aug 11, 2025Updated 6 months ago
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 10 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 4 months ago
- Matryoshka Multimodal Models☆122Jan 22, 2025Updated last year
- Code of ICME2024 Paper: Video Object Segmentation with Dynamic Query Modulation☆12Mar 23, 2024Updated last year
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated 10 months ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆38Oct 9, 2025Updated 5 months ago
- LVAS-Agent Code Base☆22Apr 15, 2025Updated 10 months ago
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- Official PyTorch implementation of “MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation”☆18Dec 5, 2024Updated last year
- UGround: Towards Unified Visual Grounding with Unrolled Transformers☆21Feb 15, 2026Updated 3 weeks ago
- ☆12Dec 4, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆97Apr 14, 2025Updated 10 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- [CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"☆36Apr 28, 2025Updated 10 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- AGIQA-1k-Database for AI Generated Content Image Quality Assessment☆29May 1, 2023Updated 2 years ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year