☆51Aug 22, 2025Updated 6 months ago
Alternatives and similar repositories for Awesome-Unified-Understanding-and-Generation
Users that are interested in Awesome-Unified-Understanding-and-Generation are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction☆42May 16, 2024Updated last year
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆87Oct 15, 2025Updated 4 months ago
- ☆22May 9, 2024Updated last year
- [CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation☆24Jan 21, 2025Updated last year
- The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery"☆25Jul 25, 2024Updated last year
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 5 months ago
- The official code of "Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling"☆47Feb 26, 2026Updated last week
- ☆13Sep 2, 2023Updated 2 years ago
- ☆15May 4, 2025Updated 10 months ago
- [ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation☆98Jun 26, 2025Updated 8 months ago
- [TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆65Dec 22, 2025Updated 2 months ago
- Chat about anything on any video!☆39Sep 5, 2023Updated 2 years ago
- [ICCV 2025] Prompt-A-Video☆22Feb 2, 2025Updated last year
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆98Feb 11, 2025Updated last year
- [ICCV 2025] Code for Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction☆171Dec 15, 2025Updated 2 months ago
- ☆17Feb 26, 2024Updated 2 years ago
- ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation☆122May 8, 2025Updated 10 months ago
- A paper list that includes world models or generative video models for embodied agents.☆26Jan 17, 2025Updated last year
- [CVPR 2023] LOGO: A Long-Form Video Dataset for Group Action Quality Assessment☆46Apr 9, 2024Updated last year
- Towards training VQ-VAE models robustly!☆93Jul 14, 2025Updated 7 months ago
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆53Dec 28, 2025Updated 2 months ago
- The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024☆45Jun 4, 2024Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆98Jan 3, 2025Updated last year
- ☆22Sep 26, 2024Updated last year
- Distributed parallel 3D-Causal-VAE for efficient training and inference☆47Aug 20, 2025Updated 6 months ago
- An implementation of 'simple diffusion: End-to-end diffusion for high resolution images' as published by Hoogeboom et al.☆40Feb 9, 2025Updated last year
- Curated list of recent visual autoregressive (VAR) modeling works☆30Mar 17, 2025Updated 11 months ago
- implementation of "Action Quality Assessment with Temporal Parsing Transformer"☆24Aug 2, 2022Updated 3 years ago
- assistant tools for attention visualization in deep learning☆29Aug 4, 2022Updated 3 years ago
- NEAT: Distilling 3D Wireframes from Neural Attraction Fields (CVPR 2024)☆73Mar 29, 2024Updated last year
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Oct 21, 2024Updated last year
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆425Jun 20, 2025Updated 8 months ago
- [ICLR 2026] Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time☆333Oct 31, 2025Updated 4 months ago
- [CVPR 2026] Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction☆198Jan 14, 2026Updated last month
- [ICCV 2025] Pytorch implementation of "VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Pr…☆50Jul 28, 2025Updated 7 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆80Dec 10, 2024Updated last year
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆76Sep 23, 2024Updated last year
- [3DV 2024] Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization☆33Mar 17, 2025Updated 11 months ago