facebookresearch / EgoTVLinks
EgoTV Egocentric Task Verification from Natural Language Task Descriptions
β27Updated last year
Alternatives and similar repositories for EgoTV
Users that are interested in EgoTV are comparing it to the libraries listed below
Sorting:
- Official codebase for EmbCLIPβ126Updated 2 years ago
- π A Python Package for Seamless Data Distribution in AI Workflowsβ22Updated last year
- A Model for Embodied Adaptive Object Detectionβ45Updated 2 years ago
- Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.β33Updated 2 years ago
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"β49Updated 5 months ago
- π Visual Room Rearrangementβ118Updated last year
- Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding [AAAI 23].β17Updated 2 years ago
- General-purpose Visual Understanding Evaluationβ20Updated last year
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)β33Updated 2 years ago
- [CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024β127Updated 2 months ago
- β42Updated last year
- β49Updated last year
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"β28Updated last year
- Pytorch Code and Data for EnvEdit: Environment Editing for Vision-and-Language Navigation (CVPR 2022)β32Updated 2 years ago
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β94Updated 2 months ago
- Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal traβ¦β90Updated 2 years ago
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Follβ¦β38Updated last year
- code for TIDEE: Novel Room Reorganization using Visuo-Semantic Common Sense Priorsβ38Updated last year
- β70Updated 7 months ago
- Code for "Interactive Task Planning with Language Models"β30Updated 2 months ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)β44Updated 11 months ago
- Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"β79Updated last year
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"β53Updated last year
- Dataset and baseline for Scenario Oriented Object Navigation (SOON)β18Updated 3 years ago
- β33Updated 2 years ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ130Updated 8 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasksβ57Updated 9 months ago
- β60Updated 3 years ago
- Evaluate Multimodal LLMs as Embodied Agentsβ53Updated 5 months ago
- [ICLR 2023] SQA3D for embodied scene understanding and reasoningβ135Updated last year