zhousheng97 / ViTXT-GQA
✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)
☆14Updated 3 weeks ago
Alternatives and similar repositories for ViTXT-GQA:
Users that are interested in ViTXT-GQA are comparing it to the libraries listed below
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆63Updated 9 months ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆29Updated last year
- Source code of our CVPR2024 paper TeachCLIP for Text-to-Video Retrieval☆29Updated last month
- [CVPR 2025] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆24Updated 2 weeks ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detection☆43Updated 8 months ago
- Composed Video Retrieval