NJU-PCALab/InstanceCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NJU-PCALab/InstanceCap)

NJU-PCALab / InstanceCap

[CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍

☆45

Alternatives and similar repositories for InstanceCap

Users that are interested in InstanceCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NJU-PCALab / CoDi
View on GitHub
CoDi:Subject-Consistent and Pose-Diverse Text-to-Image Generation
☆36Aug 1, 2025Updated 11 months ago
NJU-PCALab / MotionSight
View on GitHub
[ICLR 2026] MotionSight's official code implementation.
☆48Apr 24, 2026Updated 2 months ago
NJU-PCALab / UltraHR-100k
View on GitHub
This is the official repository of UltraHR-100K.
☆45Nov 21, 2025Updated 8 months ago
NJU-PCALab / TextCrafter
View on GitHub
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
☆97Nov 26, 2025Updated 7 months ago
NJU-PCALab / OpenVid-1M
View on GitHub
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
☆452May 30, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
NJU-PCALab / RAG-Diffusion
View on GitHub
[ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
☆622Dec 12, 2025Updated 7 months ago
TencentYoutuResearch / T2I-L2P
View on GitHub
Code for "L2P: Unlocking Latent Potential for Pixel Generation"
☆179Jul 11, 2026Updated last week
NJU-PCALab / AddSR
View on GitHub
☆120Jan 8, 2025Updated last year
GXNU-ZhongLab / RSTrack
View on GitHub
Explicit Context Reasoning with Supervision for Visual Tracking (ACM MM 25)
☆18Jul 20, 2025Updated last year
NJU-PCALab / DiP
View on GitHub
[CVPR 2026] DiP: Taming Diffusion Models in Pixel Space
☆71Jun 15, 2026Updated last month
NJU-PCALab / ERR
View on GitHub
[CVPR 2025] Official code of "From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Persp…
☆60Apr 16, 2026Updated 3 months ago
NJU-PCALab / L2P
View on GitHub
L2P: Unlocking Latent Potential for Pixel Generation
☆39May 22, 2026Updated last month
alibaba-damo-academy / DyDiT
View on GitHub
The official implementation of "2025ICLR Dynamic Diffusion Transformer" and "2025ArXivDyDiT++: Dynamic Diffusion Transformers for Efficie…
☆52Apr 10, 2025Updated last year
syguan96 / Novel-StyleGAN-Inversion-Papers
View on GitHub
Interesting StyleGAN-related papers. Focusing on StyleGAN inversion.
☆16Jul 19, 2021Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
PengWan-Yang / commonLocalization
View on GitHub
☆17Nov 5, 2020Updated 5 years ago
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated last year
AIGeeksGroup / UniVid
View on GitHub
UniVid: The Open-Source Unified Video Model
☆32Oct 13, 2025Updated 9 months ago
TIGER-AI-Lab / VideoScore
View on GitHub
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
☆121Dec 4, 2025Updated 7 months ago
NUS-HPC-AI-Lab / Dynamic-Diffusion-Transformer
View on GitHub
☆96Mar 26, 2025Updated last year
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
hqhQAQ / PatchDPO
View on GitHub
[CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
☆46Jul 1, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
g-fiche / Mesh-VQ-VAE
View on GitHub
Implementation of the Mesh-VQVAE of "VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space" - ECCV 2024
☆18Oct 30, 2024Updated last year
mu-cai / TemporalBench
View on GitHub
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆40Nov 10, 2024Updated last year
PKU-YuanGroup / Edit-R1
View on GitHub
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
☆294Jan 24, 2026Updated 5 months ago
qiumuyang / SIAB
View on GitHub
Implementation of The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medic…
☆11May 12, 2025Updated last year
JIA-Lab-research / MagicMirror
View on GitHub
[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
☆131Jun 26, 2025Updated last year
Fr0zenCrane / Cockatiel
View on GitHub
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
☆38May 21, 2025Updated last year
viddle-app / animatediff
View on GitHub
Animatediff implementation. Includes a ControlNet pipeline.
☆19Dec 24, 2023Updated 2 years ago
showlab / DIM
View on GitHub
[ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
☆28May 11, 2026Updated 2 months ago
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
WangWenhao0716 / TIP-I2V
View on GitHub
[ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
☆41Nov 27, 2024Updated last year
xiefan-guo / i4vgen
View on GitHub
[arXiv 2024] I4VGen: Image as Free Stepping Stone for Text-to-Video Generation
☆24Oct 6, 2024Updated last year
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆67Jan 27, 2026Updated 5 months ago
ysy31415 / direct_a_video
View on GitHub
☆95May 25, 2024Updated 2 years ago
amazon-science / instruct-video-to-video
View on GitHub
☆133Feb 13, 2024Updated 2 years ago
MC-E / InstructX
View on GitHub
☆86Oct 10, 2025Updated 9 months ago