Tencent-Hunyuan / HunyuanVisionLinks
☆87Updated 2 months ago
Alternatives and similar repositories for HunyuanVision
Users that are interested in HunyuanVision are comparing it to the libraries listed below
Sorting:
- ☆78Updated 7 months ago
- Official PyTorch implementation of TokenSet.☆127Updated 9 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆70Updated 2 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆37Updated 6 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆95Updated last month
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆43Updated last month
- ☆130Updated 6 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 10 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆113Updated 4 months ago
- An official implementation of SwapAnyone.☆72Updated 9 months ago
- This is the offical repository of InfiniteVL☆62Updated last week
- GenExam: A Multidisciplinary Text-to-Image Exam☆50Updated last week
- ☆64Updated 5 months ago
- VCode: SVG as Symbolic Visual Representation☆116Updated last week
- Official repo for UAE☆77Updated this week
- LVAS-Agent Code Base☆21Updated 8 months ago
- ☆140Updated 2 months ago
- PyTorch implementation of NEPA☆196Updated this week
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆73Updated 3 months ago
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆162Updated 5 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆46Updated 5 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 5 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆120Updated 9 months ago
- [Preprint] UCGM: Unified Continuous Generative Models☆171Updated 7 months ago
- Test-time Scaling for VAR models☆26Updated 3 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆32Updated 5 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 8 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 5 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆86Updated 10 months ago