invictus717/MiCo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/invictus717/MiCo)

invictus717 / MiCo

[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale

☆124

Alternatives and similar repositories for MiCo

Users that are interested in MiCo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AILab-CVC / M2PT
View on GitHub
[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆101Mar 13, 2024Updated 2 years ago
invictus717 / UniDG
View on GitHub
Towards Unified and Effective Domain Generalization
☆34Nov 27, 2023Updated 2 years ago
invictus717 / InteractiveVideo
View on GitHub
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
☆133Feb 7, 2024Updated 2 years ago
zehanwang01 / OmniBind
View on GitHub
☆34Apr 11, 2025Updated last year
Rubics-Xuan / Med-DANet
View on GitHub
Med-DANet Series (ECCV 2022 & WACV 2024)
☆13Jan 2, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
CASIA-IVA-Lab / OPT_Questioner
View on GitHub
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Aug 9, 2023Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated last year
YBIO / FineGrip
View on GitHub
☆11Jun 22, 2024Updated 2 years ago
GeoX-Lab / RS-GPT4V
View on GitHub
☆37Jul 1, 2024Updated 2 years ago
cnzzx / VSA
View on GitHub
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆128Nov 6, 2024Updated last year
THUNLP-MT / ActiView
View on GitHub
☆11Dec 20, 2024Updated last year
xverse-ai / XVERSE-V-13B
View on GitHub
☆78May 6, 2024Updated 2 years ago
AILab-CVC / UniRepLKNet
View on GitHub
[CVPR 2024 & TPAMI 2025] UniRepLKNet
☆1,072Aug 10, 2025Updated 11 months ago
FoundationVision / OmniTokenizer
View on GitHub
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆325Jul 9, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,433Jan 12, 2026Updated 6 months ago
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
stanfordmlgroup / ManyICL
View on GitHub
☆147May 23, 2024Updated 2 years ago
MCG-NJU / TemporalPerceiver
View on GitHub
[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
☆39Aug 29, 2023Updated 2 years ago
UCSB-AI / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
Sirius-Li / UAVGen
View on GitHub
Source code for Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection (CVPR 26)
☆17Mar 10, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhangguanghao523 / CMMCoT
View on GitHub
[AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…
☆11Dec 5, 2025Updated 7 months ago
lartpang / UltraHighResolution
View on GitHub
Papers about the ultra high resolution tasks.
☆13Jul 12, 2024Updated 2 years ago
aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
XavierJiezou / KTDA
View on GitHub
[ICME 2025 Oral] Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation
☆14Dec 23, 2025Updated 7 months ago
MitsuiChen14 / DGTRS
View on GitHub
☆32Jun 10, 2026Updated last month
Optimization-AI / FastCLIP
View on GitHub
Distributed Optimization Infra for learning CLIP models
☆31Oct 3, 2024Updated last year
csuhan / OneLLM
View on GitHub
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
☆666Oct 22, 2024Updated last year
AILab-CVC / SEED-X
View on GitHub
Multimodal Models in Real World
☆558Feb 24, 2025Updated last year
BAAI-DCAI / SpatialBot
View on GitHub
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆349Sep 14, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
invictus717 / MetaTransformer
View on GitHub
Meta-Transformer for Unified Multimodal Learning
☆1,650Dec 5, 2023Updated 2 years ago
Xavier-Lin / TrackSSM
View on GitHub
The official implement of TrackSSM
☆40Oct 13, 2024Updated last year
Luo-Z13 / GLH-Bridge-page
View on GitHub
[TPAMI2024] Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
☆15Mar 18, 2025Updated last year
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
facebookresearch / EgoObjects
View on GitHub
[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
☆86Oct 6, 2023Updated 2 years ago
lavinal712 / Awesome-Visual-Tokenizers
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to visual tokenizers.
☆17Jul 7, 2026Updated 2 weeks ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year