NJU-PCALab / InstanceCap
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption π
β28Updated 3 weeks ago
Alternatives and similar repositories for InstanceCap:
Users that are interested in InstanceCap are comparing it to the libraries listed below
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β66Updated last month
- CAR: Controllable AutoRegressive Modeling for Visual Generationβ90Updated last month
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.β50Updated 2 weeks ago
- a collection of awesome autoregressive visual generation modelsβ59Updated last week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesisβ83Updated 5 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generationβ57Updated 4 months ago
- This is the official implementation for ControlVAR.β84Updated last month
- Code for ROICtrl: Boosting Instance Control for Visual Generationβ99Updated last month
- FQGAN: Factorized Visual Tokenization and Generationβ39Updated this week
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficientβ75Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generationβ113Updated 2 months ago
- β124Updated 3 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"β45Updated 2 months ago
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controllerβ31Updated 3 weeks ago
- [NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillationβ58Updated 2 months ago
- β33Updated 2 months ago
- [CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Modelsβ63Updated 4 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β34Updated last month
- Liquid: Language Models are Scalable Multi-modal Generatorsβ57Updated 3 weeks ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physicsβ62Updated 2 months ago
- Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"β170Updated 2 weeks ago
- [ECCV 2024] Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learningβ40Updated 3 weeks ago
- β42Updated last week
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decodingβ26Updated 2 months ago
- Open implementation of "RandAR"β46Updated last week
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"β39Updated last month
- β46Updated last week
- Code for FreeTraj, a tuning-free method for trajectory-controllable video generationβ94Updated 5 months ago
- β220Updated 5 months ago
- [CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalizationβ88Updated 9 months ago