TencentARC/MindOmni

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TencentARC/MindOmni)

TencentARC / MindOmni

[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

☆139

Alternatives and similar repositories for MindOmni

Users that are interested in MindOmni are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Tencent / HaploVLM
View on GitHub
ICML2025
☆63Aug 28, 2025Updated 10 months ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆884Dec 23, 2025Updated 7 months ago
EasonXiao-888 / SpatialEdit
View on GitHub
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
☆214Apr 13, 2026Updated 3 months ago
TencentARC / DSR_Suite
View on GitHub
☆74Apr 21, 2026Updated 3 months ago
facebookresearch / metaquery
View on GitHub
Official Implementation of Paper Transfer between Modalities with MetaQueries
☆325Oct 12, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Fr0zenCrane / UniCoT
View on GitHub
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆234May 31, 2026Updated last month
TencentARC / TokLIP
View on GitHub
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆236Aug 18, 2025Updated 11 months ago
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆239Sep 21, 2025Updated 10 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆727Sep 24, 2025Updated 10 months ago
ATH-MaaS / Ovis-U1
View on GitHub
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…
☆450Dec 2, 2025Updated 7 months ago
TencentARC / SEED-Bench-R1
View on GitHub
☆100Jun 23, 2025Updated last year
facebookresearch / metamorph
View on GitHub
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
☆235Jan 22, 2026Updated 6 months ago
JiuhaiChen / BLIP3o
View on GitHub
Official implementation of BLIP3o-Series
☆1,664Nov 29, 2025Updated 7 months ago
X-Omni-Team / X-Omni
View on GitHub
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
☆426Aug 26, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 5 months ago
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
HorizonWind2004 / reconstruction-alignment
View on GitHub
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…
☆411May 23, 2026Updated 2 months ago
mm-vl / ULM-R1
View on GitHub
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆48Jul 22, 2025Updated last year
wangjiangshan0725 / COVE
View on GitHub
[NeurIPS 2024] COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
☆26Dec 8, 2024Updated last year
RobertLuo1 / NeurIPS2023_SOC
View on GitHub
[NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
☆33Mar 16, 2024Updated 2 years ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
Osilly / Interleaving-Reasoning-Generation
View on GitHub
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…
☆100Jan 26, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,538Dec 30, 2025Updated 6 months ago
Franklin-Zhang0 / ReasonGen-R1
View on GitHub
Official respository for ReasonGen-R1
☆75Jun 23, 2025Updated last year
CodeGoat24 / UnifiedReward
View on GitHub
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
☆796Jun 18, 2026Updated last month
FreedomIntelligence / ShareGPT-4o-Image
View on GitHub
☆285Jul 22, 2025Updated last year
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
ZiyuGuo99 / Image-Generation-CoT
View on GitHub
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆865Mar 19, 2026Updated 4 months ago
wusize / OpenUni
View on GitHub
☆189Jun 27, 2025Updated last year
appletea233 / EditThinker
View on GitHub
Unlocking Iterative Reasoning for Any Image Editor
☆112Jan 18, 2026Updated 6 months ago
yifan123 / flow_grpo
View on GitHub
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆2,430May 7, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
EasonXiao-888 / MambaTree
View on GitHub
[NeurIPS2024 Spotlight] The official implementation of MambaTree: Tree Topology is All You Need in State Space Model
☆113Jun 13, 2024Updated 2 years ago
EasonXiao-888 / UVCOM
View on GitHub
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆117Jul 17, 2024Updated 2 years ago
lyk412 / Consistent123
View on GitHub
[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors
☆25Oct 22, 2024Updated last year
zai-org / VisionReward
View on GitHub
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
☆422Mar 26, 2025Updated last year
inclusionAI / Ming-UniVision
View on GitHub
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
☆143Oct 14, 2025Updated 9 months ago
hzphzp / WeGen
View on GitHub
☆27Apr 25, 2025Updated last year
jiaosiyuu / ThinkGen
View on GitHub
ThinkGen: Generalized Thinking for Visual Generation
☆61Dec 30, 2025Updated 6 months ago