[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆17Feb 16, 2026Updated 2 months ago
Alternatives and similar repositories for ViTXT-GQA
Users that are interested in ViTXT-GQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆47Jun 19, 2025Updated 10 months ago
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- 这个项目是基于python3的mxnet框架实现的实时视频人脸识别,其中包括视频传输,人脸识别等 部分,用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。☆16Dec 12, 2020Updated 5 years ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆31May 29, 2025Updated 10 months ago
- This is the official code of the paper "Differentiable Cross Modal Hashing via Multimodal Transformers"☆18Mar 11, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆15Aug 12, 2022Updated 3 years ago
- 【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting☆16Jul 1, 2025Updated 9 months ago
- ☆12Oct 17, 2024Updated last year
- GCL implementation☆14Mar 7, 2024Updated 2 years ago
- This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…☆50Sep 15, 2025Updated 7 months ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆70Oct 9, 2023Updated 2 years ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- ☆16Apr 21, 2025Updated 11 months ago
- ☆32Jan 28, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆21Aug 1, 2025Updated 8 months ago
- ☆10Mar 31, 2025Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆86Jul 1, 2024Updated last year
- ☆10May 4, 2018Updated 7 years ago
- Code of the Grounded MUIE model, REAMO☆11Dec 3, 2024Updated last year
- ☆32Oct 16, 2025Updated 6 months ago
- ☆21Mar 5, 2025Updated last year
- A pytorch implementation of our paper Image Captioning with Inherent Sentiment (ICME 2021 Oral).☆11Jul 18, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This repo contains code for Invariant Grounding for Video Question Answering☆27Mar 2, 2023Updated 3 years ago
- Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter [ECCV2024]☆25Mar 10, 2026Updated last month
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 7 months ago
- code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`☆11Mar 17, 2020Updated 6 years ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- ☆27Oct 25, 2022Updated 3 years ago
- ☆21Apr 10, 2024Updated 2 years ago
- The official implementation of "Cross-modal Causal Relation Alignment for Video Question Grounding. (CVPR 2025 Highlight)"☆49Apr 27, 2025Updated 11 months ago
- Project website of TE141K.☆17Mar 24, 2020Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆18Jul 14, 2025Updated 9 months ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 10 months ago
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year
- [ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…☆23Jul 28, 2025Updated 8 months ago
- ☆20Jul 28, 2025Updated 8 months ago
- [ICCV'23] UATVR: Uncertainty-Adaptive Text-Video Retrieval☆13Nov 5, 2023Updated 2 years ago
- Official PyTorch Implementation of Bucketed Ranking-based Losses for Efficient Training of Object Detectors [ECCV2024]☆26Apr 27, 2025Updated 11 months ago