[[NeurIPS 2025] UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
☆85Jul 14, 2025Updated 7 months ago
Alternatives and similar repositories for UltraVideo
Users that are interested in UltraVideo are comparing it to the libraries listed below
Sorting:
- [T-PAMI 2025] EMOv2: Pushing 5M Vision Model Frontier☆54Dec 30, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".☆48Dec 25, 2025Updated 2 months ago
- This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''☆38Dec 30, 2025Updated 2 months ago
- ☆28Mar 4, 2025Updated last year
- Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models (ICLR 2026)☆42Updated this week
- Pytorch implementation of Self-Refining Video Sampling☆146Feb 6, 2026Updated last month
- Controlnet module for Wan2.2☆42Oct 30, 2025Updated 4 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 4 months ago
- An efficient distillation method for flow matching models☆24Feb 1, 2026Updated last month
- [ICLR 2026] Lumos Project: Frontier video unified model research by Alibaba DAMO Academy.☆152Jan 27, 2026Updated last month
- ☆85Nov 16, 2025Updated 3 months ago
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆21Jul 3, 2025Updated 8 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆79Dec 12, 2025Updated 2 months ago
- Official implementation of "Towards One-Step Causal Video Generation via Adversarial Self-Distillation" (arXiv 2025). A novel framework f…☆25Nov 4, 2025Updated 4 months ago
- [NeurIPS 2025] The official code for "IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation"☆22Jun 5, 2025Updated 9 months ago
- [Preprint] Efficient Generative Model Training via Embedded Representation Warmup☆36Oct 15, 2025Updated 4 months ago
- ☆52Jun 24, 2025Updated 8 months ago
- [TMM 2025] Official Implementation of DreamJourney: Perpetual View Generation with Video Diffusion Models☆17Jun 24, 2025Updated 8 months ago
- ☆23Jul 20, 2025Updated 7 months ago
- Controlnet module for Wan2.1☆30Aug 4, 2025Updated 7 months ago
- Code for CineScale, higher-resolution video generation based on Wan☆185Aug 25, 2025Updated 6 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆125Oct 14, 2025Updated 4 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Jun 14, 2025Updated 8 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Repository for ‘Anomaly Detection and Generation with Diffusion Models: A Survey’.☆36Jun 15, 2025Updated 8 months ago
- [ICLR 2026] UniVideo: Unified Understanding, Generation, and Editing for Videos☆438Feb 11, 2026Updated 3 weeks ago
- lite attention implemented over flash attention 3☆45Updated this week
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 5 months ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆22Jan 26, 2026Updated last month
- CVPR 2026 | Official Implementation of "MultiShotMaster: A Controllable Multi-Shot Video Generation Framework" 🔥☆102Feb 22, 2026Updated 2 weeks ago
- The raw UserRL repo under construction☆97Sep 25, 2025Updated 5 months ago
- ☆114Jun 28, 2024Updated last year
- Official Implementation of "Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry"☆31Nov 10, 2025Updated 3 months ago
- ☆16Jul 23, 2024Updated last year
- OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models☆154Updated this week
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- Official code for VINCIE: Unlocking In-context Image Editing from Video☆48Sep 8, 2025Updated 6 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year