bytedance / USOLinks
π₯π₯ Open-sourced unified customization model
β1,153Updated last month
Alternatives and similar repositories for USO
Users that are interested in USO are comparing it to the libraries listed below
Sorting:
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generationββ649Updated last week
- Qwen-Image-Lightning: Speed up Qwen-Image model with distillationβ855Updated last week
- ComfyUI node for highly expressive speech and realistic zero-shot voice cloningβ300Updated last week
- β779Updated 3 months ago
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioningβ753Updated last week
- Official GitHub repository for FLUX.1 Krea [dev].β348Updated 2 months ago
- HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generationβ2,250Updated last week
- β623Updated last month
- [ICCV 2025] π₯π₯ UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioningβ1,317Updated last month
- ComfyDeployedβ423Updated last month
- [SIGGRAPH Asia 25] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Offβ320Updated last week
- Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.β658Updated last month
- In-context subject-driven image generation while preserving foreground fidelityβ350Updated 4 months ago
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generationβ1,187Updated last week
- Towards Real-Time Diffusion-Based Streaming Video Super-Resolution β An efficient one-step diffusion framework for streaming VSR with locβ¦β392Updated this week
- Lumina-Image 2.0: A Unified and Efficient Image Generative Frameworkβ808Updated 4 months ago
- Pusa: Thousands Timesteps Video Diffusion Modelβ658Updated last month
- FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformersβ473Updated 2 months ago
- β1,038Updated 5 months ago
- β1,904Updated last week
- β998Updated 2 weeks ago
- Phantom: Subject-Consistent Video Generation via Cross-Modal Alignmentβ1,440Updated last month
- ObjectClear: Complete Object Removal via Object-Effect Attentionβ487Updated last month
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β414Updated 2 months ago
- Streamlining Cartoon Production with Generative Post-Keyframingβ449Updated 2 months ago
- ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audioβ489Updated last month
- ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal controlβ¦β416Updated 2 months ago
- β753Updated 8 months ago
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.β650Updated 2 months ago
- β283Updated last month