[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆17Feb 16, 2026Updated 4 months ago
Alternatives and similar repositories for ViTXT-GQA
Users that are interested in ViTXT-GQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆52Jun 19, 2025Updated 11 months ago
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- 这个项目是基于python3的mxnet框架实现的实时视频人脸识别,其中 包括视频传输,人脸识别等部分,用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。☆16Dec 12, 2020Updated 5 years ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆32May 29, 2025Updated last year
- Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos☆17Mar 16, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆15Aug 12, 2022Updated 3 years ago
- 【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting☆17Jul 1, 2025Updated 11 months ago
- ☆12Oct 17, 2024Updated last year
- This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…☆52May 26, 2026Updated 3 weeks ago
- GCL implementation☆14Mar 7, 2024Updated 2 years ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- ☆16Apr 21, 2025Updated last year
- [CVPR 2022] Accelerating Video Object Segmentation with Compressed Video☆41Jul 3, 2022Updated 3 years ago
- ☆14Sep 9, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆36Jan 28, 2026Updated 4 months ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- Evaluation for 3D reconstruction, includes monocular depth, video depth, relative camera pose & multi-view point map estimation.☆21Aug 26, 2025Updated 9 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆26Aug 1, 2025Updated 10 months ago
- We are very happy that our work has been accepted by ACM Multimedia 2024!🥰☆12Jan 8, 2025Updated last year
- ☆12Mar 31, 2025Updated last year
- Project page for "Morphology-Aware Interactive Keypoint Estimation" accepted in MICCAI 2022.☆13Sep 14, 2024Updated last year
- ☆10May 4, 2018Updated 8 years ago
- Code of the Grounded MUIE model, REAMO☆10Dec 3, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [IEEE TMM] InstructHumans: Editing Animated 3D Human Textures with Instructions☆69Feb 28, 2026Updated 3 months ago
- ☆21Mar 5, 2025Updated last year
- A pytorch implementation of our paper Image Captioning with Inherent Sentiment (ICME 2021 Oral).☆11Jul 18, 2022Updated 3 years ago
- Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter [ECCV2024]☆26Mar 10, 2026Updated 3 months ago
- This repo contains code for Invariant Grounding for Video Question Answering☆27Mar 2, 2023Updated 3 years ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 9 months ago
- code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`☆11Mar 17, 2020Updated 6 years ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- ☆21Apr 10, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆27Oct 25, 2022Updated 3 years ago
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- Comprehensive benchmark for video text understanding☆29Jun 4, 2025Updated last year
- [CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents☆33Jun 3, 2025Updated last year
- Project website of TE141K.☆17Mar 24, 2020Updated 6 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆20May 8, 2026Updated last month
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated last year