yongliu20 / Awesome-Unified-Understanding-and-GenerationView external linksLinks
☆51Aug 22, 2025Updated 5 months ago
Alternatives and similar repositories for Awesome-Unified-Understanding-and-Generation
Users that are interested in Awesome-Unified-Understanding-and-Generation are comparing it to the libraries listed below
Sorting:
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆81Oct 15, 2025Updated 4 months ago
- [CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction☆42May 16, 2024Updated last year
- ☆22May 9, 2024Updated last year
- [CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation☆24Jan 21, 2025Updated last year
- The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery"☆25Jul 25, 2024Updated last year
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆24Oct 19, 2025Updated 3 months ago
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 4 months ago
- ☆13Sep 2, 2023Updated 2 years ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆204Jun 18, 2025Updated 8 months ago
- ☆14May 4, 2025Updated 9 months ago
- Official PyTorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Apr 3, 2025Updated 10 months ago
- [TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆58Dec 22, 2025Updated last month
- [ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation☆97Jun 26, 2025Updated 7 months ago
- Chat about anything on any video!☆38Sep 5, 2023Updated 2 years ago
- [ICCV 2025] Prompt-A-Video☆22Feb 2, 2025Updated last year
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆98Feb 11, 2025Updated last year
- [ICCV 2025] Code for Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction☆169Dec 15, 2025Updated 2 months ago
- ☆17Feb 26, 2024Updated last year
- ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation☆122May 8, 2025Updated 9 months ago
- [IEEE TCSVT 2023] The implementation of our paper Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation.☆25Dec 21, 2023Updated 2 years ago
- A paper list that includes world models or generative video models for embodied agents.☆26Jan 17, 2025Updated last year
- Towards training VQ-VAE models robustly!☆91Jul 14, 2025Updated 7 months ago
- [CVPR 2023] LOGO: A Long-Form Video Dataset for Group Action Quality Assessment☆46Apr 9, 2024Updated last year
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆52Dec 28, 2025Updated last month
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆97Jan 3, 2025Updated last year
- ☆22Sep 26, 2024Updated last year
- [ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Cons…☆443Feb 24, 2025Updated 11 months ago
- Stability-AI's SV3D (ECCV 2024 oral, Voleti et al.) in the diffusers convention.☆31Feb 5, 2025Updated last year
- An implementation of 'simple diffusion: End-to-end diffusion for high resolution images' as published by Hoogeboom et al.☆37Feb 9, 2025Updated last year
- Curated list of recent visual autoregressive (VAR) modeling works☆30Mar 17, 2025Updated 11 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Jul 24, 2025Updated 6 months ago
- assistant tools for attention visualization in deep learning☆29Aug 4, 2022Updated 3 years ago
- implementation of "Action Quality Assessment with Temporal Parsing Transformer"☆24Aug 2, 2022Updated 3 years ago
- ☆32Dec 20, 2023Updated 2 years ago
- NEAT: Distilling 3D Wireframes from Neural Attraction Fields (CVPR 2024)☆73Mar 29, 2024Updated last year
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Oct 21, 2024Updated last year
- [ICLR 2026] Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time☆323Oct 31, 2025Updated 3 months ago
- [ICCV 2025] Amodal Depth Anything: Amodal Depth Estimation in the Wild☆39Jan 26, 2026Updated 3 weeks ago
- Official code for the paper: Can3Tok (ICCV2025)☆39Aug 23, 2025Updated 5 months ago