[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Dec 28, 2023Updated 2 years ago
Alternatives and similar repositories for TextVR
Users that are interested in TextVR are comparing it to the libraries listed below
Sorting:
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 3 months ago
- Edit and Generate Anything in 3D world!☆14Apr 15, 2023Updated 2 years ago
- ☆13Feb 26, 2025Updated last year
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- A NetWork Generate Names, Based On Conditional RNN, Set Condition And Generate Different Names.☆12May 15, 2017Updated 8 years ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆106Mar 28, 2024Updated last year
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆40Feb 26, 2024Updated 2 years ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition☆23Apr 22, 2021Updated 4 years ago
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆92Mar 9, 2025Updated 11 months ago
- undergraduate work☆21Mar 22, 2017Updated 8 years ago
- This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~☆55Oct 9, 2023Updated 2 years ago
- Composed Video Retrieval☆62May 2, 2024Updated last year
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆28Sep 1, 2022Updated 3 years ago
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)☆30Sep 5, 2023Updated 2 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆32Aug 22, 2024Updated last year
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆380May 19, 2022Updated 3 years ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- 该项目主要功能为对受到成像设备及环境噪声干扰影响导致图像模糊及产生噪声干扰的图片进行修复。项目创建一个搭载在网页端的图像修复系统,用户将需要修复的图像上传到系统,系统经过处理后向用户输出修复的图片。项目基于CNN卷积神经网络,使用大量的数据集进行训练,从而优化处理能力,最终…☆14Jan 11, 2024Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"☆89Feb 25, 2025Updated last year
- ☆83Jan 18, 2026Updated last month
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆298Mar 14, 2024Updated last year
- ☆34Mar 10, 2023Updated 2 years ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Mar 1, 2025Updated 11 months ago
- Use yolov5 to realize the road occupation operation and vehicle parking violation detection in urban streets, and can independently delin…☆12Jan 2, 2023Updated 3 years ago
- MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]☆22Dec 10, 2025Updated 2 months ago
- ☆21Dec 11, 2025Updated 2 months ago
- 智慧园区☆10Aug 3, 2017Updated 8 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆40Jul 29, 2023Updated 2 years ago
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆35Aug 10, 2023Updated 2 years ago
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆16Mar 23, 2025Updated 11 months ago
- ☆10May 29, 2024Updated last year
- Python + OpenCV script to detect playing cards in an image. It uses template matching.☆13Jan 24, 2017Updated 9 years ago
- A replica of the original Disney friendly robot WALL-E☆12Feb 25, 2020Updated 6 years ago