[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆81Dec 27, 2025Updated 3 months ago
Alternatives and similar repositories for MoME
Users that are interested in MoME are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning☆27Mar 23, 2025Updated last year
- [CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant☆27Dec 2, 2025Updated 3 months ago
- A large scale inpainting & t2i anime image dataset☆15Oct 18, 2025Updated 5 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆163Jun 8, 2024Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)☆32Aug 9, 2024Updated last year
- ☆17Aug 7, 2024Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆57Oct 10, 2024Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆182Oct 14, 2024Updated last year
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 10 months ago
- AGIQA-1k-Database for AI Generated Content Image Quality Assessment☆29May 1, 2023Updated 2 years ago
- [CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"☆37Apr 28, 2025Updated 11 months ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆92Mar 22, 2026Updated last week
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆23Jun 27, 2025Updated 9 months ago
- LVAS-Agent Code Base☆22Apr 15, 2025Updated 11 months ago
- ☆47Nov 8, 2024Updated last year
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Jul 24, 2025Updated 8 months ago
- [ACM MM 2025] DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models☆23Aug 6, 2025Updated 7 months ago
- Matryoshka Multimodal Models☆123Jan 22, 2025Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆41Oct 9, 2025Updated 5 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Jul 16, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for paper "Markovian Scale Prediction: A New Era of Visual Autoregressive Generation".☆31Feb 21, 2026Updated last month
- (TIP 2024) Towards Robust Referring Image Segmentation☆36Mar 2, 2024Updated 2 years ago
- Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection☆17Mar 19, 2024Updated 2 years ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition☆14Jan 1, 2024Updated 2 years ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆33Mar 16, 2024Updated 2 years ago
- ☆22May 8, 2025Updated 10 months ago
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆14Apr 17, 2025Updated 11 months ago
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- The released data for the paper entilted "FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models"☆54Jul 28, 2025Updated 8 months ago
- The offical implementation of 'FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant'☆48Nov 22, 2024Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆62Aug 23, 2024Updated last year
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆123Jul 1, 2024Updated last year
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆276Jan 20, 2026Updated 2 months ago
- [ICCV 2025 Highlight] Less is More: Empowering GUI Agent with Context-Aware Simplification☆44Mar 12, 2026Updated 2 weeks ago