[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
☆46Jun 19, 2025Updated 8 months ago
Alternatives and similar repositories for EgoTextVQA
Users that are interested in EgoTextVQA are comparing it to the libraries listed below
Sorting:
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering☆16Feb 16, 2026Updated 3 weeks ago
- This is the official code of the paper "Differentiable Cross Modal Hashing via Multimodal Transformers"☆18Mar 11, 2024Updated last year
- MAPLE infuses dexterous manipulation priors from egocentric videos into vision encoders, making their features well-suited for downstream…☆29Dec 9, 2025Updated 3 months ago
- Human-centric environment representations from egocentric video☆14Feb 5, 2026Updated last month
- ☆15Aug 12, 2022Updated 3 years ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆26Jul 15, 2025Updated 7 months ago
- 这个项目是基于python3的mxnet框架实现的实时视频人脸识别,其中包括视频传输,人脸识别等部分,用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。☆16Dec 12, 2020Updated 5 years ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆403Mar 19, 2025Updated 11 months ago
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- VideoDirector [CVPR 2025]☆33Nov 25, 2025Updated 3 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆33May 27, 2025Updated 9 months ago
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- TStar is a unified temporal search framework for long-form video question answering☆88Sep 2, 2025Updated 6 months ago
- This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…☆48Sep 15, 2025Updated 5 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated last month
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆19Feb 20, 2026Updated 2 weeks ago
- ☆41Sep 9, 2025Updated 6 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆83Jul 1, 2024Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆142Aug 21, 2025Updated 6 months ago
- ☆14Jul 11, 2024Updated last year
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆36Feb 21, 2026Updated 2 weeks ago
- ☆23Dec 11, 2025Updated 2 months ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆25Jan 21, 2026Updated last month
- [AAAI 2026 Poster] TOSC: Task-Oriented Shape Completion for Open-World Dexterous Grasp Generation from Partial Point Clouds☆19Feb 2, 2026Updated last month
- ☆10Oct 5, 2022Updated 3 years ago
- Portfolio with data science and machine learning projects I developed during my training in data science.☆10Jan 4, 2021Updated 5 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Apr 11, 2025Updated 10 months ago
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆180Feb 25, 2025Updated last year
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆44Nov 8, 2024Updated last year
- The first reimplementation of paperswithcode website.☆87Sep 12, 2025Updated 5 months ago
- ☆10Jun 19, 2024Updated last year
- 人工智能基础(高中版) 非官方代码☆13May 25, 2021Updated 4 years ago
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Dec 25, 2025Updated 2 months ago
- 基于langchain和chatglm6b构建的智能问答系统,支持自定义语料☆10Jun 25, 2023Updated 2 years ago
- Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis (ACCV 2022)☆10Jul 22, 2024Updated last year