bytedance / USOLinks
π₯π₯ Open-sourced unified customization model
β1,199Updated 3 months ago
Alternatives and similar repositories for USO
Users that are interested in USO are comparing it to the libraries listed below
Sorting:
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioningβ1,050Updated last week
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generationββ665Updated 2 months ago
- [ArXiv 25] Stable Video Infinity: Infinite-Length Video Generation with Error Recyclingβ832Updated this week
- Qwen-Image-Lightning: Speed up Qwen-Image model with distillationβ1,105Updated last week
- Official inference repo for FLUX.2 modelsβ1,289Updated last month
- HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generationβ2,617Updated 2 months ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"β1,206Updated 2 weeks ago
- ComfyUI node for highly expressive speech and realistic zero-shot voice cloningβ354Updated 2 weeks ago
- β701Updated last month
- β1,485Updated last month
- β1,728Updated last week
- [SIGGRAPH Asia 25] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Offβ331Updated 2 months ago
- β783Updated 5 months ago
- Pusa: Thousands Timesteps Video Diffusion Modelβ669Updated 3 months ago
- Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.β702Updated last week
- Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narrativesβ570Updated last month
- Official GitHub repository for FLUX.1 Krea [dev].β358Updated 5 months ago
- [ICCV 2025] π₯π₯ UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioningβ1,343Updated 3 months ago
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.β785Updated 4 months ago
- ComfyDeployedβ438Updated 3 months ago
- Qwen-Image-Layered: Layered Decomposition for Inherent Editablityβ1,030Updated last week
- β1,044Updated 7 months ago
- PersonaLive! : Expressive Portrait Image Animation for Live Streamingβ1,075Updated this week
- In-context subject-driven image generation while preserving foreground fidelityβ350Updated 6 months ago
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generationβ1,196Updated 2 months ago
- Lumina-Image 2.0: A Unified and Efficient Image Generative Frameworkβ844Updated last month
- [Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Datasetβ541Updated 2 months ago
- MoCha: End-to-End Video Character Replacement without Structural Guidanceβ525Updated last month
- ObjectClear: Complete Object Removal via Object-Effect Attentionβ519Updated last month
- ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audioβ547Updated 3 months ago