[CVPR'25] ππ EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
β50Jun 19, 2025Updated 11 months ago
Alternatives and similar repositories for EgoTextVQA
Users that are interested in EgoTextVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answeringβ17Feb 16, 2026Updated 3 months ago
- β15Aug 12, 2022Updated 3 years ago
- MAPLE infuses dexterous manipulation priors from egocentric videos into vision encoders, making their features well-suited for downstreamβ¦β32Dec 9, 2025Updated 5 months ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Modelβ15Apr 7, 2026Updated last month
- Human-centric environment representations from egocentric videoβ15Feb 5, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ427Mar 19, 2025Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β89Jul 1, 2024Updated last year
- Code implementation of the paper 'ExpertAF: Expert Actionable Feedback from Video'β14Sep 30, 2025Updated 8 months ago
- VideoDirector [CVPR 2025]β36Nov 25, 2025Updated 6 months ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."β28Jan 21, 2026Updated 4 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β63Sep 13, 2024Updated last year
- TTRV: Test-Time Reinforcement Learning for VisionβLanguage Models (CVPR 2026)β43Mar 8, 2026Updated 2 months ago
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)β189Aug 2, 2025Updated 9 months ago
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.β15Mar 12, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pytorch implementation for Egoinstructor at CVPR 2024β28Dec 1, 2024Updated last year
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urbanβ¦β30Jul 15, 2025Updated 10 months ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_feaβ¦β13Jan 30, 2020Updated 6 years ago
- β14Feb 26, 2024Updated 2 years ago
- Evaluation for 3D reconstruction, includes monocular depth, video depth, relative camera pose & multi-view point map estimation.β21Aug 26, 2025Updated 9 months ago
- Official implementation of EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splattingβ61Jun 20, 2025Updated 11 months ago
- β11Mar 11, 2025Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)β23Aug 1, 2025Updated 9 months ago
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"β181Feb 25, 2025Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ37May 27, 2025Updated last year
- β11Jul 19, 2023Updated 2 years ago
- We are very happy that our work has been accepted by ACM Multimedia 2024οΌπ₯°β11Jan 8, 2025Updated last year
- Official code for ''RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge''.β32Feb 25, 2026Updated 3 months ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Modelsβ37Feb 21, 2026Updated 3 months ago
- β11Mar 31, 2025Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ54Jun 12, 2025Updated 11 months ago
- [ACM MM2025]: Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR Localizationβ19Oct 29, 2025Updated 7 months ago
- β33Feb 12, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β13Nov 28, 2021Updated 4 years ago
- β34Oct 16, 2025Updated 7 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understandingβ58May 1, 2026Updated 3 weeks ago
- [ICLR2026] Spatial Reasoning with Vision-Language Modelsβ53Jan 26, 2026Updated 4 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Modelsβ24Apr 18, 2026Updated last month
- A public repository for ConDo (AAAI25 accepted)β10Dec 21, 2024Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ151Aug 21, 2025Updated 9 months ago