Tencent / HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
β3,805Updated this week
Alternatives and similar repositories for HunyuanDiT:
Users that are interested in HunyuanDiT are comparing it to the libraries listed below
- Kolors Teamβ4,108Updated 2 months ago
- πΊ An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusionβ1,680Updated this week
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Textβ1,481Updated last month
- GPT4V-level open-source multi-modal model based on Llama3-8Bβ2,204Updated 4 months ago
- Latte: Latent Diffusion Transformer for Video Generation.β1,756Updated 3 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generationβ2,126Updated 5 months ago
- [ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priorsβ2,717Updated 4 months ago
- High-Quality Human Motion Video Generation with Confidence-aware Pose Guidanceβ2,092Updated 3 months ago
- PixArt-Ξ£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generationβ1,738Updated 2 months ago
- The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.β5,547Updated 6 months ago
- β1,353Updated last month
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animationβ8,119Updated 4 months ago
- Unofficial Implementation of Animate Anyoneβ2,888Updated 6 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsβ2,709Updated 3 weeks ago
- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidanceβ4,117Updated 6 months ago
- VideoSys: An easy and efficient system for video generationβ1,875Updated 2 weeks ago
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoisingβ2,559Updated 6 months ago
- Transparent Image Layer Diffusion using Latent Transparencyβ2,052Updated 7 months ago
- PixArt-Ξ±: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesisβ2,931Updated 2 months ago
- InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation π₯β1,734Updated 3 months ago
- Next-Token Prediction is All You Needβ1,965Updated 2 months ago
- Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRAβ1,481Updated 3 months ago
- [ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)β1,739Updated 3 weeks ago
- Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>β4,479Updated 6 months ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. ζ₯θΏGPT-4o葨η°ηεΌζΊε€ζ¨‘ζε―Ήθ―樑εβ6,793Updated 3 weeks ago
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.β4,207Updated this week
- Code of Pyramidal Flow Matching for Efficient Video Generative Modelingβ2,701Updated 3 weeks ago
- Official repository of In-Context LoRA for Diffusion Transformersβ1,480Updated 3 weeks ago
- OneDiff: An out-of-the-box acceleration library for diffusion models.β1,758Updated this week
- text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)β10,303Updated this week