Egocentric Video Understanding Dataset (EVUD)
☆33Jul 4, 2024Updated last year
Alternatives and similar repositories for EVUD
Users that are interested in EVUD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆77Sep 12, 2024Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 4 months ago
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆46Apr 9, 2025Updated 11 months ago
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆39Feb 24, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆92Apr 30, 2024Updated last year
- ☆157Oct 31, 2024Updated last year
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Jun 13, 2024Updated last year
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆34May 27, 2025Updated 10 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆53Jul 23, 2025Updated 8 months ago
- [CHI24] AI-Assisted In-Context Writing on OHMD During Travels☆11Dec 19, 2024Updated last year
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆82Dec 6, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [AAAI 2023 Oral] Language-Assisted 3D Feature Learning for Semantic Scene Understanding☆12Aug 1, 2023Updated 2 years ago
- ☆37Sep 16, 2024Updated last year
- ☆23Aug 26, 2023Updated 2 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Apr 11, 2025Updated 11 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆46Apr 29, 2024Updated last year
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆26Dec 30, 2024Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Mar 25, 2025Updated last year
- A Holistic Embodied Cognition Benchmark☆19Apr 3, 2025Updated 11 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆86Oct 25, 2024Updated last year
- ☆194Oct 14, 2024Updated last year
- [ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement☆36Jul 29, 2024Updated last year
- ☆32Jul 29, 2024Updated last year
- This is the source code of PFRec☆14Dec 16, 2022Updated 3 years ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆132Nov 10, 2024Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆33Nov 1, 2025Updated 4 months ago
- Official Implementation of HIMA (COLM'25)☆19Nov 25, 2025Updated 4 months ago
- ☆10Sep 12, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆59Feb 2, 2026Updated last month
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆106Nov 28, 2024Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)☆55Apr 15, 2024Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆343Sep 20, 2024Updated last year
- ☆109Dec 30, 2024Updated last year
- ☆40May 7, 2024Updated last year
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆26Jul 15, 2025Updated 8 months ago