[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆17Feb 16, 2026Updated 3 months ago
Alternatives and similar repositories for ViTXT-GQA
Users that are interested in ViTXT-GQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆50Jun 19, 2025Updated 11 months ago
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- 这个项目是基于python3的mxnet框架实现的实时视频人脸识别,其中包括视频传输,人脸识别等 部分,用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。☆16Dec 12, 2020Updated 5 years ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆31May 29, 2025Updated last year
- This is the official code of the paper "Differentiable Cross Modal Hashing via Multimodal Transformers"☆18Mar 11, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos☆17Mar 16, 2026Updated 2 months ago
- ☆15Aug 12, 2022Updated 3 years ago
- 【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting☆17Jul 1, 2025Updated 10 months ago
- ☆12Oct 17, 2024Updated last year
- GCL implementation☆14Mar 7, 2024Updated 2 years ago
- [NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting☆71Oct 9, 2023Updated 2 years ago
- ☆16Apr 21, 2025Updated last year
- ☆14Sep 9, 2024Updated last year
- ☆33Jan 28, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Evaluation for 3D reconstruction, includes monocular depth, video depth, relative camera pose & multi-view point map estimation.☆21Aug 26, 2025Updated 9 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆23Aug 1, 2025Updated 9 months ago
- We are very happy that our work has been accepted by ACM Multimedia 2024!🥰☆11Jan 8, 2025Updated last year
- Project page for "Morphology-Aware Interactive Keypoint Estimation" accepted in MICCAI 2022.☆13Sep 14, 2024Updated last year
- ☆10May 4, 2018Updated 8 years ago
- Code of the Grounded MUIE model, REAMO☆10Dec 3, 2024Updated last year
- [IEEE TMM] InstructHumans: Editing Animated 3D Human Textures with Instructions☆68Feb 28, 2026Updated 3 months ago
- ☆21Mar 5, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A pytorch implementation of our paper Image Captioning with Inherent Sentiment (ICME 2021 Oral).☆11Jul 18, 2022Updated 3 years ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 8 months ago
- code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`☆11Mar 17, 2020Updated 6 years ago
- [ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…☆42May 1, 2026Updated 3 weeks ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- ☆27Oct 25, 2022Updated 3 years ago
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- ☆21Apr 10, 2024Updated 2 years ago
- Comprehensive benchmark for video text understanding☆28Jun 4, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents☆32Jun 3, 2025Updated 11 months ago
- The official implementation of "Cross-modal Causal Relation Alignment for Video Question Grounding. (CVPR 2025 Highlight)"☆50Apr 27, 2025Updated last year
- Project website of TE141K.☆17Mar 24, 2020Updated 6 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆19May 8, 2026Updated 3 weeks ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 11 months ago
- [ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…☆23Jul 28, 2025Updated 10 months ago
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year