callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆24Updated last year
Alternatives and similar repositories for TextVR:
Users that are interested in TextVR are comparing it to the libraries listed below
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆41Updated 3 weeks ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 7 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆68Updated 3 months ago
- ☆19Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆26Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆38Updated 2 weeks ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated 11 months ago
- ☆17Updated 9 months ago
- ☆46Updated 7 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning