OpenGVLab/Mono-InternVL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/Mono-InternVL)

OpenGVLab / Mono-InternVL

[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

☆109

Alternatives and similar repositories for Mono-InternVL

Users that are interested in Mono-InternVL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wzk1015 / WorldCupArena
View on GitHub
⚽️🤖 Benchmarking LLMs and deep-research agents on real-world football prediction — from the tactical "who scores in minute 67" to the st…
☆18Updated this week
inclusionAI / Ming-UniVision
View on GitHub
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
☆143Oct 14, 2025Updated 9 months ago
OpenGVLab / GenExam
View on GitHub
[ICML 2026] GenExam: A Multidisciplinary Text-to-Image Exam
☆69May 26, 2026Updated last month
tliby / UniFork
View on GitHub
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆48Aug 26, 2025Updated 10 months ago
YihongT / LLMSynthor
View on GitHub
☆21Jul 3, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / BizGenEval
View on GitHub
Bridging the gap between image generation and real-world design: a benchmark for structured, multi-constraint commercial visual content g…
☆20Apr 24, 2026Updated 2 months ago
VisionXLab / GRADE
View on GitHub
[ECCV'26] GRADE: Grounded Reasoning Assessment for Discipline-informed Editing
☆28Apr 23, 2026Updated 2 months ago
Qinying-Liu / TagAlign
View on GitHub
Official implementation of TagAlign
☆37Dec 11, 2024Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
OpenGVLab / InternVL-U
View on GitHub
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image edit…
☆291Mar 21, 2026Updated 3 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
ByteVisionLab / TokenFlow
View on GitHub
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆464Aug 8, 2025Updated 11 months ago
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
VisionXLab / Rise-Video
View on GitHub
RISE-Video: Can Video Generators Decode Implicit World Rules?
☆28Mar 26, 2026Updated 3 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
VisionXLab / SpaCE-10
View on GitHub
[ICLR 2026] SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
☆20Jan 26, 2026Updated 5 months ago
VisionXLab / mllm-mmrotate
View on GitHub
[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.
☆92Feb 12, 2026Updated 5 months ago
saikrishna-prathapaneni / LowDINO
View on GitHub
☆12Aug 19, 2023Updated 2 years ago
kxfan2002 / SophiaVL-R1
View on GitHub
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆94Aug 8, 2025Updated 11 months ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆883Dec 23, 2025Updated 6 months ago
luka-group / vlm-knowledge-conflict
View on GitHub
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆54Oct 19, 2024Updated last year
sail-sg / OPER
View on GitHub
code for the paper Offline Prioritized Experience Replay
☆12Jun 13, 2023Updated 3 years ago
ModalMinds / MM-PRM
View on GitHub
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆30May 26, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
PhoenixZ810 / RISEBench
View on GitHub
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆154May 18, 2026Updated 2 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,582Jun 14, 2025Updated last year
impiga / Plain-DETR
View on GitHub
[ICCV2023] DETR Doesn’t Need Multi-Scale or Locality Design
☆232Nov 14, 2023Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
lifan724 / magic_eraser
View on GitHub
☆20Jul 14, 2024Updated 2 years ago
mit-han-lab / vila-u
View on GitHub
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆425Apr 25, 2025Updated last year
jianzongwu / robust-ref-seg
View on GitHub
(TIP 2024) Towards Robust Referring Image Segmentation
☆40Mar 2, 2024Updated 2 years ago
JiuhaiChen / BLIP3o
View on GitHub
Official implementation of BLIP3o-Series
☆1,663Nov 29, 2025Updated 7 months ago
dfan / webssl
View on GitHub
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
☆244Apr 24, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
MengLcool / SEGIC
View on GitHub
[ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".
☆27Oct 13, 2024Updated last year
Yangyi-Chen / SOLO
View on GitHub
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆150Nov 14, 2024Updated last year
mlpc-ucsd / MasQCLIP
View on GitHub
(ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation
☆37Oct 18, 2023Updated 2 years ago
inclusionAI / M2-Reasoning
View on GitHub
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆47Jul 17, 2025Updated last year
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆15Jul 11, 2024Updated 2 years ago
wyhlovecpp / GPT-Image-Edit
View on GitHub
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
☆243Aug 15, 2025Updated 11 months ago
SHI-Labs / VisPer-LM
View on GitHub
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆73Oct 17, 2025Updated 9 months ago