willyfh / awesome-video-text-datasets
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
β31Updated 8 months ago
Related projects β
Alternatives and complementary repositories for awesome-video-text-datasets
- π R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)β62Updated 4 months ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grβ¦β116Updated 2 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Propertiesβ117Updated this week
- [ICCV 2023] Accurate and Fast Compressed Video Captioningβ34Updated 8 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)β91Updated last year
- β101Updated last year
- β53Updated 4 months ago
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"β136Updated 7 months ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"β98Updated 9 months ago
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detectionβ73Updated 3 months ago
- β30Updated last month
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehensionβ22Updated 10 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ48Updated 3 months ago
- β38Updated 5 months ago
- β169Updated 2 years ago
- β72Updated 6 months ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Modelβ39Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β96Updated 3 months ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detectionβ37Updated 4 months ago
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Modelsβ295Updated 5 months ago
- β104Updated 8 months ago
- Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 β¦β204Updated 11 months ago
- β55Updated 6 months ago
- β73Updated 2 years ago
- β26Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ40Updated 4 months ago
- Supercharged BLIP-2 that can handle videosβ116Updated 11 months ago
- β164Updated 4 months ago
- [ECCVβ24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarioβ¦β39Updated 2 months ago
- β120Updated last month
- Source code of our MM'22 paper Partially Relevant Video Retrievalβ51Updated last week