[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆31Dec 28, 2023Updated 2 years ago
Alternatives and similar repositories for TextVR
Users that are interested in TextVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code-Implementation-of-Super-Resolution-ZOO (image & video)☆10Jul 6, 2020Updated 5 years ago
- [ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization☆57Nov 10, 2023Updated 2 years ago
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆41Feb 26, 2024Updated 2 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated 2 years ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Edit and Generate Anything in 3D world!☆13Apr 15, 2023Updated 3 years ago
- Composed Video Retrieval☆62May 2, 2024Updated 2 years ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆114Mar 28, 2024Updated 2 years ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆25Aug 1, 2025Updated 10 months ago
- We are very happy that our work has been accepted by ACM Multimedia 2024!🥰☆12Jan 8, 2025Updated last year
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- A curated list of deep learning resources for video-text retrieval.☆645Oct 20, 2023Updated 2 years ago
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆95Mar 9, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆376May 19, 2022Updated 4 years ago
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…☆154Aug 21, 2024Updated last year
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)☆31Sep 5, 2023Updated 2 years ago
- [ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval☆26Feb 13, 2025Updated last year
- 2d smoothed-particle hydrodynamics (SPH) fluid simulation on the GPU in WebGL 2.☆11Nov 8, 2022Updated 3 years ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆25Feb 10, 2026Updated 3 months ago
- Dual fisheye video stitching in Python3, forked from : https://github.com/cynricfu/dual-fisheye-video-stitching☆13Dec 20, 2018Updated 7 years ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆71Oct 9, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆21Mar 5, 2025Updated last year
- Agentic Keyframe Search for Video Question Answering☆18Apr 7, 2025Updated last year
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 9 months ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆301Mar 14, 2024Updated 2 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition☆23Apr 22, 2021Updated 5 years ago
- Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning☆15Dec 12, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆20May 8, 2026Updated last month
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆30Mar 6, 2026Updated 3 months ago
- Automatic panorama stitching with automatic camera calibration/distortion estimation☆19Oct 9, 2023Updated 2 years ago
- [CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want☆14Jan 5, 2025Updated last year
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated last year
- [ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…☆23Jul 28, 2025Updated 10 months ago