[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆124Sep 2, 2024Updated last year
Alternatives and similar repositories for MiCo
Users that are interested in MiCo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆101Mar 13, 2024Updated 2 years ago
- Towards Unified and Effective Domain Generalization☆32Nov 27, 2023Updated 2 years ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆133Feb 7, 2024Updated 2 years ago
- ☆34Apr 11, 2025Updated last year
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 8 months ago
- ☆11Jun 22, 2024Updated last year
- ☆12Dec 20, 2024Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Nov 6, 2024Updated last year
- ☆79May 6, 2024Updated last year
- [CVPR 2024 & TPAMI 2025] UniRepLKNet☆1,069Aug 10, 2025Updated 8 months ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆323Jul 9, 2024Updated last year
- [AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…☆12Dec 5, 2025Updated 4 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Next-Token Prediction is All You Need☆2,393Jan 12, 2026Updated 3 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆158Dec 6, 2024Updated last year
- ☆147May 23, 2024Updated last year
- [ICME 2025 Oral] Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation☆13Dec 23, 2025Updated 3 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 9 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,992Nov 7, 2025Updated 5 months ago
- [T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection☆39Aug 29, 2023Updated 2 years ago
- Official implementation of the RSE paper mKGR.☆20Jan 15, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆19May 19, 2024Updated last year
- ☆12Aug 8, 2024Updated last year
- Distributed Optimization Infra for learning CLIP models☆29Oct 3, 2024Updated last year
- Papers about the ultra high resolution tasks.☆13Jul 12, 2024Updated last year
- ☆126Jul 29, 2024Updated last year
- Code and updates for the ScoreRS project.☆42Sep 19, 2025Updated 6 months ago
- [TPAMI2024] Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery☆15Mar 18, 2025Updated last year
- Multimodal Models in Real World☆557Feb 24, 2025Updated last year
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Language☆665Oct 22, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆342Sep 14, 2025Updated 7 months ago
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆518Nov 14, 2025Updated 5 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆76Jun 26, 2025Updated 9 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,923May 26, 2025Updated 10 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆951Aug 5, 2025Updated 8 months ago