[CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption π
β47Jul 5, 2025Updated 7 months ago
Alternatives and similar repositories for InstanceCap
Users that are interested in InstanceCap are comparing it to the libraries listed below
Sorting:
- CoDi:Subject-Consistent and Pose-Diverse Text-to-Image Generationβ36Aug 1, 2025Updated 6 months ago
- [ICLR 2026] MotionSight's official code implementation.β46Feb 13, 2026Updated 2 weeks ago
- TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenesβ93Nov 26, 2025Updated 3 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ399May 30, 2025Updated 9 months ago
- [ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement π₯β620Dec 12, 2025Updated 2 months ago
- This is the official repository of UltraHR-100K.β44Nov 21, 2025Updated 3 months ago
- A repo for generating random NFTs with metadata 100% on chain!β37Mar 8, 2024Updated last year
- β120Jan 8, 2025Updated last year
- [CVPR 2025] Official code of "From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspβ¦β49Apr 2, 2025Updated 10 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β113Dec 4, 2025Updated 2 months ago
- β13May 17, 2025Updated 9 months ago
- Latest Advances on Autoregressive Visual Models.πβ28Mar 15, 2025Updated 11 months ago
- [TVCG 2026] Official repo of "DreamBarbie: Text to Barbie-Style 3D Avatarsββ29Updated this week
- PreciseCam: Precise Camera Control for Text-to-Image Generationβ25May 7, 2025Updated 9 months ago
- β11Jan 8, 2025Updated last year
- β13Jul 10, 2024Updated last year
- β34Mar 18, 2025Updated 11 months ago
- β46Dec 30, 2024Updated last year
- All tools developed by myself for personal purposes.β16Feb 1, 2026Updated 3 weeks ago
- [ICCV 2025] Official repo of "StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priorsββ32Dec 30, 2025Updated last month
- UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generationβ18Aug 12, 2025Updated 6 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ37Nov 10, 2024Updated last year
- More reliable Video Understanding Evaluationβ14Sep 23, 2025Updated 5 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]β45Jul 22, 2025Updated 7 months ago
- β23Jul 20, 2025Updated 7 months ago
- On Path to Multimodal Generalist: General-Level and General-Benchβ18Jul 11, 2025Updated 7 months ago
- Vision Large Language Models trained on M3IT instruction tuning datasetβ17Aug 16, 2023Updated 2 years ago
- [CVPR 2025] Official Implementation of MotionPro: A Precise Motion Controller for Image-to-Video Generationβ146Dec 29, 2025Updated 2 months ago
- [CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longβ¦β321Mar 30, 2025Updated 11 months ago
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generationβ45Jul 1, 2025Updated 7 months ago
- [NOTE] I do not have enough ressources to maintain VMS, please use Ostris's AI-Tookit insteadβ43Oct 3, 2025Updated 4 months ago
- Implementation of the Mesh-VQVAE of "VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space" - ECCV 2024β17Oct 30, 2024Updated last year
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"β38May 21, 2025Updated 9 months ago
- β42Jul 9, 2025Updated 7 months ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generationβ38Nov 27, 2024Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.β27Nov 18, 2025Updated 3 months ago
- official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".β46Dec 25, 2025Updated 2 months ago
- UniVid: The Open-Source Unified Video Modelβ30Oct 13, 2025Updated 4 months ago
- [CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"β82Feb 13, 2026Updated 2 weeks ago