[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆17Feb 16, 2026Updated 2 months ago
Alternatives and similar repositories for ViTXT-GQA
Users that are interested in ViTXT-GQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆48Jun 19, 2025Updated 10 months ago
- 这个项目是基于python3的mxnet框架实现的实时视频人脸识别,其中包括视频传输,人脸识别等部分,用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。☆16Dec 12, 2020Updated 5 years ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆31May 29, 2025Updated 11 months ago
- ☆15Aug 12, 2022Updated 3 years ago
- 【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting☆16Jul 1, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆12Oct 17, 2024Updated last year
- GCL implementation☆14Mar 7, 2024Updated 2 years ago
- This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…☆51Sep 15, 2025Updated 7 months ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆71Oct 9, 2023Updated 2 years ago
- ☆16Apr 21, 2025Updated last year
- [CVPR 2022] Accelerating Video Object Segmentation with Compressed Video☆42Jul 3, 2022Updated 3 years ago
- ☆32Jan 28, 2026Updated 3 months ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆23Aug 1, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Mar 31, 2025Updated last year
- Project page for "Morphology-Aware Interactive Keypoint Estimation" accepted in MICCAI 2022.☆13Sep 14, 2024Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆87Jul 1, 2024Updated last year
- ☆10May 4, 2018Updated 8 years ago
- Code of the Grounded MUIE model, REAMO☆10Dec 3, 2024Updated last year
- ☆34Oct 16, 2025Updated 6 months ago
- [IEEE TMM] InstructHumans: Editing Animated 3D Human Textures with Instructions☆68Feb 28, 2026Updated 2 months ago
- ☆21Mar 5, 2025Updated last year
- A pytorch implementation of our paper Image Captioning with Inherent Sentiment (ICME 2021 Oral).☆11Jul 18, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repo contains code for Invariant Grounding for Video Question Answering☆27Mar 2, 2023Updated 3 years ago
- Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter [ECCV2024]☆25Mar 10, 2026Updated last month
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 8 months ago
- code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`☆11Mar 17, 2020Updated 6 years ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- ☆27Oct 25, 2022Updated 3 years ago
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD