invictus717 / MiCo
Explore the Limits of Omni-modal Pretraining at Scale
☆97Updated 7 months ago
Alternatives and similar repositories for MiCo:
Users that are interested in MiCo are comparing it to the libraries listed below
- Official repository of MMDU dataset☆89Updated 6 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆131Updated 5 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆91Updated 9 months ago
- ☆115Updated 8 months ago
- ☆133Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆142Updated 4 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆64Updated 7 months ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning