[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Dec 28, 2023Updated 2 years ago
Alternatives and similar repositories for TextVR
Users that are interested in TextVR are comparing it to the libraries listed below
Sorting:
- ☆13Feb 26, 2025Updated last year
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 4 months ago
- Code-Implementation-of-Super-Resolution-ZOO (image & video)☆10Jul 6, 2020Updated 5 years ago
- [ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization☆57Nov 10, 2023Updated 2 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 2 years ago
- Composed Video Retrieval☆63May 2, 2024Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆109Mar 28, 2024Updated last year
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 7 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- A curated list of deep learning resources for video-text retrieval.☆645Oct 20, 2023Updated 2 years ago
- ☆10Mar 31, 2025Updated 11 months ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…☆153Aug 21, 2024Updated last year
- [ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval☆26Feb 13, 2025Updated last year
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)☆30Sep 5, 2023Updated 2 years ago
- ☆34Mar 10, 2023Updated 3 years ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 6 months ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Mar 1, 2025Updated last year
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆24Mar 6, 2026Updated 2 weeks ago
- Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning☆15Dec 12, 2023Updated 2 years ago
- [CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want☆14Jan 5, 2025Updated last year
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering☆16Feb 16, 2026Updated last month
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~☆55Oct 9, 2023Updated 2 years ago
- ☆15May 13, 2024Updated last year
- [ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding☆376May 8, 2024Updated last year
- ☆20Jul 28, 2025Updated 7 months ago
- [ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"☆13Feb 15, 2025Updated last year
- This repository implements the training, testing and evaluation code for the "VQ-NeRV: A Vector Quantised Neural Representation for Video…☆10Feb 19, 2024Updated 2 years ago
- Realtime Face detection demo using YOLO v2 and OpenCV DNN module☆17Mar 10, 2018Updated 8 years ago
- Official repository for the AAAI2026 paper (Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery …☆22Feb 4, 2026Updated last month
- ☆28May 16, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"☆89Feb 25, 2025Updated last year