[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆82Dec 27, 2025Updated 3 months ago
Alternatives and similar repositories for MoME
Users that are interested in MoME are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning☆28Mar 23, 2025Updated last year
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆72Oct 17, 2025Updated 6 months ago
- [CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant☆27Dec 2, 2025Updated 4 months ago
- A large scale inpainting & t2i anime image dataset☆15Oct 18, 2025Updated 6 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆163Jun 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)☆32Aug 9, 2024Updated last year
- ☆18Aug 7, 2024Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆58Oct 10, 2024Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆183Oct 14, 2024Updated last year
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 11 months ago
- [CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"☆39Apr 28, 2025Updated 11 months ago
- ☆17Dec 1, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆23Jun 27, 2025Updated 9 months ago
- AGIQA-1k-Database for AI Generated Content Image Quality Assessment☆29May 1, 2023Updated 2 years ago
- UGround: Towards Unified Visual Grounding with Unrolled Transformers☆22Feb 15, 2026Updated 2 months ago
- LVAS-Agent Code Base☆22Apr 15, 2025Updated last year
- ☆47Nov 8, 2024Updated last year
- [ACM MM 2025] DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models☆23Aug 6, 2025Updated 8 months ago
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆20May 8, 2025Updated 11 months ago
- Matryoshka Multimodal Models☆123Jan 22, 2025Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆43Oct 9, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICASSP2024] Code for paper "SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection"☆15Jul 6, 2024Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆53Jul 24, 2025Updated 8 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Jul 16, 2024Updated last year
- (TIP 2024) Towards Robust Referring Image Segmentation☆36Mar 2, 2024Updated 2 years ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition☆14Jan 1, 2024Updated 2 years ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆33Mar 16, 2024Updated 2 years ago
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆22May 8, 2025Updated 11 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- The released data for the paper entilted "FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models"☆54Jul 28, 2025Updated 8 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆124Jul 1, 2024Updated last year
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆279Jan 20, 2026Updated 2 months ago