[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Dec 28, 2023Updated 2 years ago
Alternatives and similar repositories for TextVR
Users that are interested in TextVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Feb 26, 2025Updated last year
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆20Nov 3, 2025Updated 5 months ago
- Code-Implementation-of-Super-Resolution-ZOO (image & video)☆10Jul 6, 2020Updated 5 years ago
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆40Feb 26, 2024Updated 2 years ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Edit and Generate Anything in 3D world!☆14Apr 15, 2023Updated 2 years ago
- Composed Video Retrieval☆62May 2, 2024Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆110Mar 28, 2024Updated 2 years ago
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- A curated list of deep learning resources for video-text retrieval.☆645Oct 20, 2023Updated 2 years ago
- ☆10Mar 31, 2025Updated last year
- FishCam: A low-cost open source autonomous camera for aquatic research☆13Feb 9, 2026Updated 2 months ago
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆377May 19, 2022Updated 3 years ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…☆153Aug 21, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval☆26Feb 13, 2025Updated last year
- 2d smoothed-particle hydrodynamics (SPH) fluid simulation on the GPU in WebGL 2.☆11Nov 8, 2022Updated 3 years ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆23Feb 10, 2026Updated 2 months ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆70Oct 9, 2023Updated 2 years ago
- Agentic Keyframe Search for Video Question Answering☆17Apr 7, 2025Updated last year
- ☆34Mar 10, 2023Updated 3 years ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆299Mar 14, 2024Updated 2 years ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆24Mar 6, 2026Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆96Mar 1, 2025Updated last year
- Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning☆15Dec 12, 2023Updated 2 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆18Jul 14, 2025Updated 8 months ago
- Automatic panorama stitching with automatic camera calibration/distortion estimation☆19Oct 9, 2023Updated 2 years ago
- [CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want☆14Jan 5, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 10 months ago
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering☆17Feb 16, 2026Updated last month
- This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~☆55Oct 9, 2023Updated 2 years ago
- [ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding☆375May 8, 2024Updated last year
- ☆15May 13, 2024Updated last year
- ☆10Feb 23, 2021Updated 5 years ago