Egocentric Video Understanding Dataset (EVUD)
☆33Jul 4, 2024Updated last year
Alternatives and similar repositories for EVUD
Users that are interested in EVUD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆77Sep 12, 2024Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 4 months ago
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆46Apr 9, 2025Updated last year
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆39Feb 24, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆93Apr 30, 2024Updated last year
- ☆157Oct 31, 2024Updated last year
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Jun 13, 2024Updated last year
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆34May 27, 2025Updated 10 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆54Jul 23, 2025Updated 8 months ago
- [CHI24] AI-Assisted In-Context Writing on OHMD During Travels☆11Dec 19, 2024Updated last year
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆81Dec 6, 2024Updated last year
- [AAAI 2023 Oral] Language-Assisted 3D Feature Learning for Semantic Scene Understanding☆12Aug 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆23Aug 26, 2023Updated 2 years ago
- ☆37Sep 16, 2024Updated last year
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆40Apr 11, 2025Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆46Apr 29, 2024Updated last year
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆26Dec 30, 2024Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆86Oct 25, 2024Updated last year
- ☆194Oct 14, 2024Updated last year
- Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.☆37Jan 20, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement☆37Jul 29, 2024Updated last year
- ☆32Jul 29, 2024Updated last year
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆132Nov 10, 2024Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Nov 1, 2025Updated 5 months ago
- Official Implementation of HIMA (COLM'25)☆19Nov 25, 2025Updated 4 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆60Feb 2, 2026Updated 2 months ago
- ☆10Sep 12, 2024Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆106Nov 28, 2024Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)☆56Apr 15, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆346Sep 20, 2024Updated last year
- ☆111Dec 30, 2024Updated last year
- ☆40May 7, 2024Updated last year
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆26Jul 15, 2025Updated 9 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 9 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆133Nov 19, 2024Updated last year
- ☆23Jul 1, 2025Updated 9 months ago