shxie2020 / Awesome-UGVFMLinks

A collection of vision foundation models unifying understanding and generation.

☆59

Alternatives and similar repositories for Awesome-UGVFM

Users that are interested in Awesome-UGVFM are comparing it to the libraries listed below

Sorting:

wusize / Harmon
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆178Updated 6 months ago
PKU-YuanGroup / UAE
Official repository for the UAE paper, unified-GRPO, and unified-Bench
☆150Updated 2 months ago
PKU-YuanGroup / WISE
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
☆165Updated last month
aHapBean / VideoREPA
[NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
☆126Updated last month
facebookresearch / metaquery
Official Implementation of Paper Transfer between Modalities with MetaQueries
☆272Updated last month
rongyaofang / PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
☆131Updated 10 months ago
ziqipang / RandAR
[CVPR 2025 (Oral)] Open implementation of "RandAR"
☆200Updated 4 months ago
wusize / OpenUni
☆163Updated 5 months ago
Franklin-Zhang0 / ReasonGen-R1
Official respository for ReasonGen-R1
☆73Updated 5 months ago
OpenGVLab / PhyGenBench
[ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
☆138Updated last year
PKU-YuanGroup / UniSandBox
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆49Updated last week
PhoenixZ810 / RISEBench
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆120Updated 2 weeks ago
HorizonWind2004 / reconstruction-alignment
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…
☆316Updated last month
SAIS-FUXI / Omni-Video
☆61Updated 4 months ago
AMAP-ML / USP
[ICCV25] USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
☆87Updated last month
yongliu20 / Awesome-Unified-Understanding-and-Generation
☆51Updated 3 months ago
csuhan / Tar
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆188Updated 2 months ago
rongyaofang / GoT
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
☆299Updated 2 months ago
aim-uofa / dLLM-MidTruth
☆55Updated 3 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆177Updated 2 weeks ago
gogoduan / GoT-R1
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
☆100Updated 6 months ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆131Updated 3 months ago
hu-zijing / B2-DiffuRL
[CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.
☆49Updated 8 months ago
aniki-ly / FreeLong
[NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…
☆62Updated 5 months ago
thuml / MiniVeo3-Reasoner
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆186Updated last month
Chenyu-Wang567 / All-Angles-Bench
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
☆55Updated 2 weeks ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆104Updated last month
PKU-YuanGroup / N-LoRA
【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".
☆36Updated last year
tliby / UniFork
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆46Updated 3 months ago
Jiawei-Yang / DeTok
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
☆159Updated last month