Tencent-Hunyuan / HunyuanVisionLinks
☆83Updated last month
Alternatives and similar repositories for HunyuanVision
Users that are interested in HunyuanVision are comparing it to the libraries listed below
Sorting:
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆123Updated last month
- ☆78Updated 7 months ago
- Official PyTorch implementation of TokenSet.☆127Updated 8 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆70Updated last month
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆43Updated last week
- LVAS-Agent Code Base☆21Updated 7 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆94Updated last month
- VideoNSA: Native Sparse Attention Scales Video Understanding☆68Updated 3 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 9 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆120Updated 3 weeks ago
- ☆135Updated last month
- GenExam: A Multidisciplinary Text-to-Image Exam☆48Updated last week
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆194Updated 5 months ago
- VCode: SVG as Symbolic Visual Representation☆112Updated 2 weeks ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆40Updated 8 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆56Updated 8 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆62Updated 4 months ago
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆161Updated 4 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆119Updated 3 months ago
- Quick Long Video Understanding☆70Updated last month
- ☆130Updated 5 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆74Updated last month
- Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch☆55Updated last year
- ☆62Updated 3 months ago
- VideoAuteur: Towards Long Narrative Video Generation☆43Updated last month
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated last year
- [CVPR 2025] Parallel Sequence Modeling via Generalized Spatial Propagation Network☆108Updated 4 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 4 months ago
- [ICLR 2025] Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching☆52Updated 7 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆73Updated last year