[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆123Sep 2, 2024Updated last year
Alternatives and similar repositories for MiCo
Users that are interested in MiCo are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆101Mar 13, 2024Updated last year
- ☆33Apr 11, 2025Updated 10 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 7 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 7 months ago
- ☆36Jul 1, 2024Updated last year
- ☆79May 6, 2024Updated last year
- ☆146May 23, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆321Jul 9, 2024Updated last year
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆58Nov 16, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Nov 6, 2024Updated last year
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆22Sep 23, 2025Updated 5 months ago
- ☆11Jun 22, 2024Updated last year
- ☆12Dec 20, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆12Mar 5, 2025Updated 11 months ago
- Papers about the ultra high resolution tasks.☆13Jul 12, 2024Updated last year
- Next-Token Prediction is All You Need☆2,355Jan 12, 2026Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,986Nov 7, 2025Updated 3 months ago
- [CVPR 2024 & TPAMI 2025] UniRepLKNet☆1,068Aug 10, 2025Updated 6 months ago
- AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents☆62Updated this week
- Official implementation of the RSE paper mKGR.☆20Jan 15, 2026Updated last month
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 8 months ago
- ☆34Jan 25, 2026Updated last month
- ☆10Oct 24, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- Code and updates for the ScoreRS project.☆40Sep 19, 2025Updated 5 months ago
- Distributed Optimization Infra for learning CLIP models☆27Oct 3, 2024Updated last year
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆86Feb 27, 2025Updated last year
- Multimodal Models in Real World☆556Feb 24, 2025Updated last year
- ☆124Jul 29, 2024Updated last year
- Code repository for "Parameter Efficient Self-supervised Geospatial Domain Adaptation", CVPR 2024☆36Jul 29, 2024Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆512Nov 14, 2025Updated 3 months ago
- CounterGeDi is a pipeline that aims at controlling the counter speech generated to make it emotional, polite and detoxified. Paper accept…☆11Jul 19, 2022Updated 3 years ago
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆89Nov 17, 2025Updated 3 months ago
- "Visual Prompt Selection for In-Context Learning Segmentation Framework"☆15Dec 13, 2024Updated last year
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆34Jan 16, 2026Updated last month