guxu313 / TeViS
☆17Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for TeViS
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- (wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.☆28Updated 2 years ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆87Updated this week
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- ☆55Updated 6 months ago
- ☆72Updated 6 months ago
- ☆47Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆96Updated 3 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- ☆48Updated last year
- ☆30Updated last month
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆111Updated last month
- [ACM MM 2022]: Multi-Modal Experience Inspired AI Creation☆19Updated 5 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆48Updated last week
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 2 weeks ago
- [ACL2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆12Updated 2 weeks ago
- A PyTorch implementation of EmpiricalMVM☆39Updated 10 months ago
- Learning to cut end-to-end pretrained modules☆28Updated 3 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆48Updated 8 months ago
- [ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…☆17Updated last month
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- ☆101Updated last year
- Narrative movie understanding benchmark☆58Updated 6 months ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated last year
- TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering☆137Updated 6 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 5 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆38Updated last week
- ☆30Updated 2 weeks ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 3 weeks ago