bytedance / USOLinks
π₯π₯ Open-sourced unified customization model
β1,194Updated 3 months ago
Alternatives and similar repositories for USO
Users that are interested in USO are comparing it to the libraries listed below
Sorting:
- Qwen-Image-Lightning: Speed up Qwen-Image model with distillationβ1,031Updated last week
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generationββ661Updated last month
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioningβ994Updated last month
- [ArXiv 25] Stable Video Infinity: Infinite-Length Video Generation with Error Recyclingβ702Updated this week
- HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generationβ2,561Updated last month
- Official GitHub repository for FLUX.1 Krea [dev].β358Updated 4 months ago
- β779Updated 4 months ago
- ComfyUI node for highly expressive speech and realistic zero-shot voice cloningβ320Updated last month
- Official inference repo for FLUX.2 modelsβ1,170Updated last week
- Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narrativesβ539Updated 2 weeks ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"β243Updated last week
- Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.β687Updated 3 months ago
- β1,399Updated 3 weeks ago
- β1,317Updated last month
- [ICCV 2025] π₯π₯ UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioningβ1,335Updated 3 months ago
- β696Updated last month
- ObjectClear: Complete Object Removal via Object-Effect Attentionβ508Updated 2 weeks ago
- Pusa: Thousands Timesteps Video Diffusion Modelβ666Updated 3 months ago
- [SIGGRAPH Asia 25] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Offβ323Updated last month
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generationβ1,196Updated last month
- β1,043Updated 6 months ago
- In-context subject-driven image generation while preserving foreground fidelityβ350Updated 6 months ago
- ComfyDeployedβ433Updated 2 months ago
- ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audioβ529Updated 2 months ago
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.β762Updated 3 months ago
- FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformersβ485Updated 3 months ago
- MoCha: End-to-End Video Character Replacement without Structural Guidanceβ501Updated 3 weeks ago
- [Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Datasetβ522Updated last month
- Lumina-Image 2.0: A Unified and Efficient Image Generative Frameworkβ833Updated last month
- Phantom: Subject-Consistent Video Generation via Cross-Modal Alignmentβ1,461Updated 3 months ago