yongliu20/Awesome-Unified-Understanding-and-Generation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yongliu20/Awesome-Unified-Understanding-and-Generation)

yongliu20 / Awesome-Unified-Understanding-and-Generation

☆52

Alternatives and similar repositories for Awesome-Unified-Understanding-and-Generation

Users that are interested in Awesome-Unified-Understanding-and-Generation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AndyTang15 / FLAG3Dv2
View on GitHub
☆25May 9, 2024Updated 2 years ago
shiyi-zh0408 / NAE_CVPR2024
View on GitHub
[CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
☆43May 16, 2024Updated 2 years ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
AndyTang15 / FLAG3D
View on GitHub
☆19Jun 22, 2026Updated last month
shiyi-zh0408 / Meta-CoT
View on GitHub
[CVPR 2026] Official code of the paper "Meta-CoT: Enhancing Granularity and Generalization in Image Editing"
☆79May 6, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Yxxxb / LAVT-RS
View on GitHub
[CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation
☆26Jan 21, 2025Updated last year
Tengbo-Yu / AnyBimanual
View on GitHub
[ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation
☆102Jun 26, 2025Updated last year
SuleBai / SC-CLIP
View on GitHub
[TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆73Mar 27, 2026Updated 4 months ago
EternalEvan / DPMesh
View on GitHub
The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024
☆45Jun 4, 2024Updated 2 years ago
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆20Dec 9, 2025Updated 7 months ago
InvincibleWyq / ChatVID
View on GitHub
Chat about anything on any video!
☆39Sep 5, 2023Updated 2 years ago
YuLiu-LY / SlotLifter
View on GitHub
Code for "SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields" (ECCV 2024)
☆12Oct 30, 2024Updated last year
shiyi-zh0408 / LOGO
View on GitHub
[CVPR 2023] LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
☆48Apr 9, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
AMAP-ML / UniVG-R1
View on GitHub
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
☆167Jun 2, 2025Updated last year
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆287Oct 15, 2025Updated 9 months ago
EternalEvan / FlowIE
View on GitHub
[CVPR 2024 oral]This repository contains the official implementation of "FlowIE: Efficient Image Enhancement via Rectified Flow"
☆153Jan 13, 2025Updated last year
ManiCM-fast / ManiCM
View on GitHub
ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation
☆125May 8, 2025Updated last year
Jixuan-Fan / Momentum-GS
View on GitHub
[ICCV 2025] Code for Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
☆173Dec 15, 2025Updated 7 months ago
RobertLuo1 / CoHD
View on GitHub
The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
☆27Aug 17, 2025Updated 11 months ago
yongliu20 / SCAN
View on GitHub
[CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"
☆77Sep 23, 2024Updated last year
RammusLeo / ScoreHOI
View on GitHub
Official repository of ScoreHOI (ICCV 2025)
☆16Dec 21, 2025Updated 7 months ago
sherwinbahmani / threed_front_rendering
View on GitHub
☆13Sep 2, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Vchitect / Uni-MMMU
View on GitHub
[ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark
☆26Apr 13, 2026Updated 3 months ago
RammusLeo / DPMesh
View on GitHub
The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery"
☆25Jul 25, 2024Updated 2 years ago
Dai-Wenxun / MotionLCM
View on GitHub
[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Cons…
☆463Feb 24, 2025Updated last year
shiyi-zh0408 / FlexiAct
View on GitHub
[SIGGRAPH 2025] Official code of the paper "FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios"
☆341Oct 30, 2025Updated 8 months ago
ChangyuanWang17 / QVLM
View on GitHub
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
☆103Jan 3, 2025Updated last year
TencentARC / Divot
View on GitHub
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
☆87Feb 27, 2025Updated last year
xyfJASON / diffusion-models-pytorch
View on GitHub
Implement Diffusion Models with PyTorch.
☆28Nov 24, 2024Updated last year
VIPL-GENUN / Jodi
View on GitHub
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
☆92Mar 6, 2026Updated 4 months ago
GuanxingLu / Subspace-Clustering
View on GitHub
[IEEE TCSVT 2023] The implementation of our paper Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation.
☆26Dec 21, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ChrisDong-THU / GaussianToken
View on GitHub
Official PyTorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
☆108Apr 3, 2025Updated last year
HELLORPG / CV-Framework
View on GitHub
A simple Computer Vision Framework, mainly based on PyTorch. Including distributed training, logging and so on.
☆12Dec 2, 2023Updated 2 years ago
TencentARC / TokLIP
View on GitHub
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆236Aug 18, 2025Updated 11 months ago
GuanxingLu / vlarl
View on GitHub
Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.
☆447Nov 8, 2025Updated 8 months ago
LynnHo / Make-Workspace
View on GitHub
A better shell
☆16Updated this week
xk-huang / segment-caption-anything
View on GitHub
[CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆233Sep 30, 2024Updated last year
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆239Sep 21, 2025Updated 10 months ago