[CVPR'25] ππ EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
β47Jun 19, 2025Updated 10 months ago
Alternatives and similar repositories for EgoTextVQA
Users that are interested in EgoTextVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answeringβ17Feb 16, 2026Updated 2 months ago
- β15Aug 12, 2022Updated 3 years ago
- MAPLE infuses dexterous manipulation priors from egocentric videos into vision encoders, making their features well-suited for downstreamβ¦β30Dec 9, 2025Updated 4 months ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Modelβ16Apr 7, 2026Updated last week
- Human-centric environment representations from egocentric videoβ14Feb 5, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidationβ20Feb 14, 2025Updated last year
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ416Mar 19, 2025Updated last year
- This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing foβ¦β50Sep 15, 2025Updated 7 months ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."β26Jan 21, 2026Updated 2 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β64Sep 13, 2024Updated last year
- TTRV: Test-Time Reinforcement Learning for VisionβLanguage Models (CVPR 2026)β37Mar 8, 2026Updated last month
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.β15Mar 12, 2024Updated 2 years ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ34May 27, 2025Updated 10 months ago
- Pytorch implementation for Egoinstructor at CVPR 2024β28Dec 1, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urbanβ¦β26Jul 15, 2025Updated 9 months ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_feaβ¦β13Jan 30, 2020Updated 6 years ago
- Official implementation of EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splattingβ59Jun 20, 2025Updated 9 months ago
- β11Mar 11, 2025Updated last year
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"β181Feb 25, 2025Updated last year
- β11Jul 19, 2023Updated 2 years ago
- [ICLR2026] Spatial Reasoning with Vision-Language Modelsβ44Jan 26, 2026Updated 2 months ago
- β59Apr 28, 2025Updated 11 months ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Modelsβ37Feb 21, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β10Mar 31, 2025Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ53Jun 12, 2025Updated 10 months ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'β13Jun 16, 2024Updated last year
- β30Feb 12, 2026Updated 2 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understandingβ59Mar 24, 2026Updated 3 weeks ago
- β13Nov 28, 2021Updated 4 years ago
- β32Oct 16, 2025Updated 6 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ148Aug 21, 2025Updated 7 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Modelsβ24Jan 1, 2026Updated 3 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code accompanying paper "Fine-Grained Visual Entailment" [ECCV 2022].β11Oct 31, 2022Updated 3 years ago
- β41Sep 9, 2025Updated 7 months ago
- β21Mar 5, 2025Updated last year
- RESAnything: Attribute Prompting for Arbitrary Referring Segmentationβ17Nov 28, 2025Updated 4 months ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.β17Sep 2, 2025Updated 7 months ago
- TStar is a unified temporal search framework for long-form video question answeringβ94Mar 23, 2026Updated 3 weeks ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"β21Jul 21, 2025Updated 8 months ago