Tencent-Hunyuan / HunyuanVisionLinks
☆94Updated 3 months ago
Alternatives and similar repositories for HunyuanVision
Users that are interested in HunyuanVision are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of TokenSet.☆127Updated 10 months ago
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆96Updated 2 weeks ago
- ☆77Updated 9 months ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆42Updated 11 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆73Updated 3 months ago
- ☆37Updated 2 months ago
- ☆132Updated 7 months ago
- ☆141Updated 3 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Updated 7 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆51Updated 11 months ago
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆33Updated 5 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆121Updated 11 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 6 months ago
- An official implementation of SwapAnyone.☆74Updated 10 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆79Updated 2 months ago
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆44Updated 2 months ago
- official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".☆45Updated last month
- Quick Long Video Understanding [TMLR2025]☆74Updated 3 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆76Updated 4 months ago
- ☆63Updated 7 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆47Updated 7 months ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆77Updated last week
- Test-time Scaling for VAR models☆31Updated 4 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆119Updated 6 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- VideoNSA: Native Sparse Attention Scales Video Understanding☆79Updated 2 months ago
- GenExam: A Multidisciplinary Text-to-Image Exam☆56Updated last week
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆47Updated 6 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆75Updated 4 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆79Updated 3 months ago