OpenGVLab/InternVL-U

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/InternVL-U)

OpenGVLab / InternVL-U

InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.

☆291

Alternatives and similar repositories for InternVL-U

Users that are interested in InternVL-U are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

open-compass / TextEdit
View on GitHub
We provide TextEdit, a high-quality, multi-scenario text editing benchmark for generation models.
☆20Mar 16, 2026Updated 4 months ago
VisionXLab / GRADE
View on GitHub
[ECCV'26] GRADE: Grounded Reasoning Assessment for Discipline-informed Editing
☆28Apr 23, 2026Updated 2 months ago
Visionary-Laboratory / SpaceDG
View on GitHub
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation
☆31Jul 9, 2026Updated last week
open-compass / GenEditEvalKit
View on GitHub
The first unified, efficient, and extensible evaluation toolkit for evaluating image generation and editing models across multiple benchm…
☆50Apr 12, 2026Updated 3 months ago
VisionXLab / Rise-Video
View on GitHub
RISE-Video: Can Video Generators Decode Implicit World Rules?
☆28Mar 26, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
VisionXLab / FIRM-Reward
View on GitHub
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
☆40Mar 13, 2026Updated 4 months ago
Visionary-Laboratory / PhotoFlow
View on GitHub
PhotoFlow: Agentic 3D Virtual Photography Missions
☆38May 27, 2026Updated last month
Visionary-Laboratory / CourtSI
View on GitHub
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
☆70Mar 15, 2026Updated 4 months ago
hmwang2002 / CTRL-S
View on GitHub
[ECCV 2026] Official repository of "Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning".
☆22Updated this week
VisionXLab / EvoTok
View on GitHub
[ECCV'26] Code repo for "EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation"
☆22Jun 18, 2026Updated last month
VisionXLab / Moment-Video
View on GitHub
☆18Jun 2, 2026Updated last month
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆738Updated this week
Luo-Yihong / TDM-R1
View on GitHub
[ICML 2026][Ultra Powerful Few-Step Diffusion RL] TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
☆116May 25, 2026Updated last month
hmwang2002 / InternSVG
View on GitHub
[ICLR 2026] Official repository of "InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models".
☆120Feb 6, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
bytedance / Lance
View on GitHub
A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.
☆1,281Updated this week
microsoft / BizGenEval
View on GitHub
Bridging the gap between image generation and real-world design: a benchmark for structured, multi-constraint commercial visual content g…
☆20Apr 24, 2026Updated 2 months ago
InternScience / SGI-Bench
View on GitHub
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
☆167Jun 2, 2026Updated last month
ByteVisionLab / NextFlow
View on GitHub
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
☆331Jan 9, 2026Updated 6 months ago
HorizonWind2004 / reconstruction-alignment
View on GitHub
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…
☆410May 23, 2026Updated last month
PhoenixZ810 / RISEBench
View on GitHub
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆154May 18, 2026Updated 2 months ago
OpenGVLab / Mono-InternVL
View on GitHub
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆109Jul 18, 2025Updated last year
open-compass / MMBench-GUI
View on GitHub
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…
☆112Sep 8, 2025Updated 10 months ago
Visionary-Laboratory / holi-spatial
View on GitHub
[ICML 2026 Oral] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
☆366Jul 6, 2026Updated 2 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tulerfeng / Gen-Searcher
View on GitHub
Gen-Searcher: Reinforcing Agentic Search for Image Generation
☆376Apr 7, 2026Updated 3 months ago
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,536Dec 30, 2025Updated 6 months ago
meituan-longcat / LongCat-Image
View on GitHub
☆709May 9, 2026Updated 2 months ago
wusize / OpenUni
View on GitHub
☆189Jun 27, 2025Updated last year
ATH-MaaS / Ovis-U1
View on GitHub
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…
☆450Dec 2, 2025Updated 7 months ago
wyhlovecpp / GPT-Image-Edit
View on GitHub
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
☆243Aug 15, 2025Updated 11 months ago
OPPO-Mente-Lab / X2Edit
View on GitHub
AAAI2026 X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
☆97Nov 21, 2025Updated 8 months ago
shawn0728 / Unify-Agent
View on GitHub
🐧 Unify-Agent: An end-to-end unified multimodal agent for faithful, knowledge-grounded image generation.
☆86May 2, 2026Updated 2 months ago
Aria-Zhangjl / E3-FaceNet
View on GitHub
[ICML 2024] Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
☆23Dec 20, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stepfun-ai / NextStep-1
View on GitHub
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s …
☆689Feb 27, 2026Updated 4 months ago
VITA-MLLM / Omni-Diffusion
View on GitHub
✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
☆151Mar 12, 2026Updated 4 months ago
VisionXLab / CrossEarth-SAR
View on GitHub
The official repo of CrossEarth-SAR, a sar-centric and billion-scale geospatial foundation model for cross-domain semantic segmentation
☆46Mar 18, 2026Updated 4 months ago
bytedance / mammothmoda
View on GitHub
☆331May 6, 2026Updated 2 months ago
OpenGVLab / NaViL
View on GitHub
☆94Oct 10, 2025Updated 9 months ago
TIGER-AI-Lab / EditReward
View on GitHub
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]
☆155Apr 11, 2026Updated 3 months ago
Visionary-Laboratory / visionary
View on GitHub
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
☆514Jun 26, 2026Updated 3 weeks ago