[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆29Dec 28, 2023Updated 2 years ago
Alternatives and similar repositories for TextVR
Users that are interested in TextVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Feb 26, 2025Updated last year
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆20Nov 3, 2025Updated 5 months ago
- Code-Implementation-of-Super-Resolution-ZOO (image & video)☆10Jul 6, 2020Updated 5 years ago
- [ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization☆57Nov 10, 2023Updated 2 years ago
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆41Feb 26, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated 2 years ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 2 years ago
- Composed Video Retrieval☆62May 2, 2024Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆111Mar 28, 2024Updated 2 years ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆23Aug 1, 2025Updated 8 months ago
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆92Mar 9, 2025Updated last year
- A curated list of deep learning resources for video-text retrieval.☆645Oct 20, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆11Mar 31, 2025Updated last year
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆377May 19, 2022Updated 3 years ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…☆153Aug 21, 2024Updated last year
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)☆31Sep 5, 2023Updated 2 years ago
- [ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval☆26Feb 13, 2025Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆24Feb 10, 2026Updated 2 months ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆71Oct 9, 2023Updated 2 years ago
- ☆21Mar 5, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆33Mar 10, 2023Updated 3 years ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 7 months ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆300Mar 14, 2024Updated 2 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- This repository contains the source codes for the paper: "SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environm…☆16Oct 11, 2021Updated 4 years ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆96Mar 1, 2025Updated last year
- PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition☆23Apr 22, 2021Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆18Jul 14, 2025Updated 9 months ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 10 months ago
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year
- [ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…☆23Jul 28, 2025Updated 9 months ago
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering☆17Feb 16, 2026Updated 2 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~☆55Oct 9, 2023Updated 2 years ago