VITA-MLLM/Omni-Diffusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VITA-MLLM/Omni-Diffusion)

VITA-MLLM / Omni-Diffusion

✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

☆153

Alternatives and similar repositories for Omni-Diffusion

Users that are interested in Omni-Diffusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Northern-byte-bit / SpeechParaling-Bench
View on GitHub
☆30May 21, 2026Updated 2 months ago
yangruoliu / VideoDetective
View on GitHub
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
☆58May 1, 2026Updated 2 months ago
VITA-MLLM / VITA-QinYu
View on GitHub
VITA-QINYU: Expressive Spoken Language Model for Role-Playing and Singing
☆121Jul 14, 2026Updated last week
MiG-NJU / EvoEmbedding
View on GitHub
EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory
☆52Updated this week
MiG-NJU / PersonaVLM
View on GitHub
[CVPR 2026 Highlight] PersonaVLM: Long-Term Personalized Multimodal LLMs
☆112Apr 16, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Tencent / VITA
View on GitHub
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆162Oct 28, 2025Updated 8 months ago
MME-Benchmarks / Video-MME-v2
View on GitHub
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆369May 24, 2026Updated 2 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
VITA-MLLM / Sparrow
View on GitHub
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
☆32Mar 28, 2025Updated last year
VITA-MLLM / Long-VITA
View on GitHub
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
☆305May 14, 2025Updated last year
qhfan / UniPrefill
View on GitHub
Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"
☆41May 8, 2026Updated 2 months ago
ByteDance-Seed / VINCIE
View on GitHub
Official code for VINCIE: Unlocking In-context Image Editing from Video
☆60Jun 19, 2026Updated last month
MAC-AutoML / QuoTA
View on GitHub
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…
☆79Apr 28, 2025Updated last year
VITA-MLLM / VITA-Audio
View on GitHub
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆682May 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhourax / VEGA
View on GitHub
☆38Jul 9, 2024Updated 2 years ago
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆583Sep 23, 2025Updated 10 months ago
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
OpenGVLab / InternVL-U
View on GitHub
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image edit…
☆291Mar 21, 2026Updated 4 months ago
VisionChengzhuo / CoF-T2I
View on GitHub
Video models as pure visual reasoners for high-quality text-to-image generation via Chain-of-Frame reasoning.
☆39Jan 16, 2026Updated 6 months ago
AniAggarwal / ecad
View on GitHub
[ICLR 2026] Code for Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
☆30Mar 1, 2026Updated 4 months ago
yfzhang114 / r1_reward
View on GitHub
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆291May 9, 2025Updated last year
Gen-Verse / MMaDA
View on GitHub
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
☆1,660Feb 14, 2026Updated 5 months ago
ML-GSAI / LLaDA-o
View on GitHub
☆53May 16, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
JavisVerse / JavisDiT
View on GitHub
[ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.
☆375Mar 29, 2026Updated 3 months ago
Fr0zenCrane / Uni-ViGU
View on GitHub
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
☆33Apr 15, 2026Updated 3 months ago
KlingAIResearch / SVG-T2I
View on GitHub
[Arxiv 2025] Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder…
☆152Dec 18, 2025Updated 7 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆739Updated this week
Osilly / Interleaving-Reasoning-Generation
View on GitHub
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…
☆100Jan 26, 2026Updated 5 months ago
tang-bd / v-grpo
View on GitHub
[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
☆56Apr 28, 2026Updated 2 months ago
TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
Alpha-VLLM / Lumina-DiMOO
View on GitHub
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
☆1,003May 19, 2026Updated 2 months ago
Vchitect / Uni-MMMU
View on GitHub
[ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark
☆25Apr 13, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NVlabs / DiffusionNFT
View on GitHub
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
☆983Feb 10, 2026Updated 5 months ago
CARE-Edit / Code
View on GitHub
[CVPR 2026] A unified editor with four heterogeneous experts via condition-aware router. This repo is the official code for "CARE-Edit: C…
☆33Jun 15, 2026Updated last month
X-GenGroup / Flow-Factory
View on GitHub
A unified framework for easy reinforcement learning in Flow-Matching models
☆639Jul 12, 2026Updated last week
Lakonik / LakonLab
View on GitHub
Official implementation of AsymFlow, pi-Flow, GMFlow
☆452Jul 14, 2026Updated last week
OpenVE-Team / OpenVE-3M
View on GitHub
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
☆51Apr 15, 2026Updated 3 months ago
Vicky0522 / TokensGen
View on GitHub
[ICCV 2025] TokensGen: Harnessing Condensed Tokens for Long Video Generation
☆57Dec 10, 2025Updated 7 months ago
Ryann-Ran / Scone
View on GitHub
(CVPR 2026 Highlight) Official repository for Scone (Subject-driven COmposition and DistinctioN Enhancement) model, supporting subject co…
☆32Apr 9, 2026Updated 3 months ago