willyfh / awesome-video-text-datasets
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
☆36Updated last year
Alternatives and similar repositories for awesome-video-text-datasets:
Users that are interested in awesome-video-text-datasets are comparing it to the libraries listed below
- ☆57Updated last year
- ☆72Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ☆34Updated 7 months ago
- 🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆82Updated 10 months ago
- ☆31Updated last year
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆157Updated last year
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection☆89Updated 9 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 9 months ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆46Updated last year
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆100Updated 3 months ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 4 months ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆101Updated 5 months ago
- ☆108Updated 2 years ago
- A Unified Framework for Video-Language Understanding☆57Updated last year
- Supercharged BLIP-2 that can handle videos☆117Updated last year
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆96Updated 6 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆124Updated 6 months ago
- ☆133Updated last year
- ☆51Updated 11 months ago
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]☆54Updated 2 years ago
- Official repo for StableLLAVA☆95Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆41Updated 2 years ago
- [CVPR2023] Code for "Streaming Video Model"☆78Updated last year
- A PyTorch implementation of EmpiricalMVM☆40Updated last year
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆86Updated 2 months ago
- Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…☆123Updated last year
- ☆175Updated 2 years ago
- Video dataset dedicated to portrait-mode video recognition.☆48Updated 5 months ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…☆131Updated 8 months ago