zhousheng97/EgoTextVQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhousheng97/EgoTextVQA)

zhousheng97 / EgoTextVQA

[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

☆52

Alternatives and similar repositories for EgoTextVQA

Users that are interested in EgoTextVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhousheng97 / ViTXT-GQA
View on GitHub
[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆17Feb 16, 2026Updated 5 months ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆452Mar 19, 2025Updated last year
viridityzhu / RelaxFlow
View on GitHub
[ICML 2026 Spotlight] RelaxFlow: Text-Driven Amodal 3D Generation
☆23May 21, 2026Updated 2 months ago
EIT-NLP / Awesome-Streaming-LLMs
View on GitHub
🔥This is a repository of paper list for streaming LLMs/MLLMs.
☆24Apr 19, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
QC-LY / UiG
View on GitHub
Code for "Understanding-in-Generation:Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation"
☆15Nov 11, 2025Updated 8 months ago
monjurulkarim / ROL_Dataset
View on GitHub
Risky Object Localization (ROL) in a Driving Scene Dataset
☆15Dec 24, 2023Updated 2 years ago
facebookresearch / worldsense
View on GitHub
WorldSense benchmark for grounded reasoning in language models
☆25Nov 28, 2023Updated 2 years ago
Lil-Shake / VA-Pi
View on GitHub
[CVPR 2026] This repository is the code of our paper "VA-Pi: Variational Policy Alignment for Pixel-Aware Autoregressive Generation"
☆15Mar 3, 2026Updated 4 months ago
xiaolul2 / DynFlowDrive
View on GitHub
Code implementation of DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving
☆24Mar 23, 2026Updated 4 months ago
metadriverse / MetaVQA
View on GitHub
☆23Aug 1, 2025Updated 11 months ago
EnVision-Research / PhysToolBench
View on GitHub
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
☆30Jul 20, 2026Updated last week
iLearn-Lab / MM23-RTQ
View on GitHub
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆15Apr 7, 2026Updated 3 months ago
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Oct 25, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / ego-env
View on GitHub
Human-centric environment representations from egocentric video
☆15Feb 5, 2026Updated 5 months ago
AdaCheng / VidEgoThink
View on GitHub
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"
☆18Mar 25, 2025Updated last year
Chiaraplizz / OSNOM
View on GitHub
Official repository from the paper "Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind"
☆17Mar 18, 2025Updated last year
kalenforn / clip-based-cross-modal-hash
View on GitHub
This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…
☆55May 26, 2026Updated 2 months ago
ligengen / EgoM2P
View on GitHub
[ICCV 2025] The official implementation for EgoM2P: Egocentric Multimodal Multitask Pretraining.
☆38Jun 15, 2026Updated last month
JavisVerse / Awesome-AVI
View on GitHub
Awesome Audio-Visual Intelligence, Survey of Audio-Visual Intelligence
☆84May 8, 2026Updated 2 months ago
AIM-SKKU / QA-TIGER
View on GitHub
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆29Jun 6, 2025Updated last year
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
md-mohaiminul / BIMBA
View on GitHub
☆29Jul 25, 2025Updated last year
yunlong10 / Video-R4
View on GitHub
Reinforcing Text-Rich Video Reasoning with Visual Rumination
☆28Jun 5, 2026Updated last month
AdaCheng / EgoThink
View on GitHub
[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…
☆64Mar 25, 2025Updated last year
Lyman-Smoker / Awesome_Ego_Action
View on GitHub
A curated list of Egocentric Action Understanding resources
☆59Nov 26, 2025Updated 8 months ago
yl3800 / IGV
View on GitHub
This repo contains code for Invariant Grounding for Video Question Answering
☆27Mar 2, 2023Updated 3 years ago
Time-Search / TimeSearch-R
View on GitHub
[ICLR 2026] Official code for paper: TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinf…
☆27Jan 29, 2026Updated 6 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆122Nov 4, 2025Updated 8 months ago
pengzhansun / Counterfactual-Debiasing-Network
View on GitHub
[ACM MM 2021] A causal perspective for compositional action recognition, providing a counterfactual debiasing inference implementation to…
☆20May 5, 2022Updated 4 years ago
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
gabrielegoletto / AMEGO
View on GitHub
Code for the paper "AMEGO: Active Memory from long EGOcentric videos" published at ECCV 2024
☆45Dec 7, 2024Updated last year
Synteraction-Lab / PANDALens
View on GitHub
[CHI24] AI-Assisted In-Context Writing on OHMD During Travels
☆12Dec 19, 2024Updated last year
Hoar012 / TDC-Video
View on GitHub
Official implementation of TDC.
☆15Jul 22, 2025Updated last year
algvr / maple
View on GitHub
MAPLE infuses dexterous manipulation priors from egocentric videos into vision encoders, making their features well-suited for downstream…
☆34Dec 9, 2025Updated 7 months ago
path2generalist / General-Level
View on GitHub
On Path to Multimodal Generalist: General-Level and General-Bench
☆21Jul 11, 2025Updated last year
callsys / TextVR
View on GitHub
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆32Dec 28, 2023Updated 2 years ago
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Updated this week