JiuTian-VL / MoMEView external linksLinks
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆79Dec 27, 2025Updated last month
Alternatives and similar repositories for MoME
Users that are interested in MoME are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆70Oct 17, 2025Updated 3 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated 11 months ago
- Official code repository of Shuffle-R1☆25Jan 27, 2026Updated 2 weeks ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Jul 24, 2025Updated 6 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆55Oct 10, 2024Updated last year
- ☆17Aug 7, 2024Updated last year
- Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)☆31Aug 9, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆180Oct 14, 2024Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆23Jun 27, 2025Updated 7 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆162Jun 8, 2024Updated last year
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆90Jan 3, 2026Updated last month
- ☆46Nov 8, 2024Updated last year
- A large scale inpainting & t2i anime image dataset☆14Oct 18, 2025Updated 3 months ago
- Open-Vocabulary Panoptic Segmentation☆27Jun 15, 2025Updated 7 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆123Jul 1, 2024Updated last year
- [CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"☆31Apr 28, 2025Updated 9 months ago
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 9 months ago
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆11Aug 11, 2025Updated 6 months ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model☆13Dec 29, 2024Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- Matryoshka Multimodal Models☆122Jan 22, 2025Updated last year
- LVAS-Agent Code Base☆22Apr 15, 2025Updated 9 months ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated 2 weeks ago
- Code of ICME2024 Paper: Video Object Segmentation with Dynamic Query Modulation☆12Mar 23, 2024Updated last year
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆27Dec 24, 2025Updated last month
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆12Apr 17, 2025Updated 9 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- ☆12Dec 4, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Jan 23, 2026Updated 3 weeks ago
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆52Oct 19, 2024Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Dec 10, 2024Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆96Apr 14, 2025Updated 10 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- AGIQA-1k-Database for AI Generated Content Image Quality Assessment☆29May 1, 2023Updated 2 years ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Nov 10, 2024Updated last year