Tencent-Hunyuan / HunyuanVisionLinks
☆93Updated 3 months ago
Alternatives and similar repositories for HunyuanVision
Users that are interested in HunyuanVision are comparing it to the libraries listed below
Sorting:
- ☆77Updated 8 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆72Updated 3 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆96Updated 2 months ago
- Official PyTorch implementation of TokenSet.☆127Updated 10 months ago
- VCode: SVG as Symbolic Visual Representation☆120Updated last month
- ☆132Updated 6 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Updated 7 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 11 months ago
- ☆141Updated 3 months ago
- An official implementation of SwapAnyone.☆73Updated 10 months ago
- ☆81Updated 3 weeks ago
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head☆100Updated this week
- LVAS-Agent Code Base☆22Updated 9 months ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆161Updated 2 weeks ago
- ☆95Updated last year
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆165Updated 6 months ago
- The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation☆39Updated 8 months ago
- [Preprint] UCGM: Unified Continuous Generative Models☆176Updated 7 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆120Updated 10 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆114Updated 5 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆141Updated last month
- Make self forcing endless. Add cache purging. Add prompt controllability.☆68Updated 4 months ago
- The code implementation for the paper "DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation".☆29Updated 4 months ago
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆33Updated 4 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 8 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆80Updated 8 months ago
- AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model☆52Updated 3 months ago
- Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…☆152Updated this week
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 5 months ago
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆44Updated last month