facebookresearch / open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
☆233Updated last month
Related projects ⓘ
Alternatives and complementary repositories for open-eqa
- Compose multimodal datasets 🎹☆204Updated this week
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆123Updated 2 months ago
- [ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models☆150Updated 4 months ago
- 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆263Updated 5 months ago
- A flexible and efficient codebase for training visually-conditioned language models (VLMs)☆467Updated 4 months ago
- Official repository of Learning to Act from Actionless Videos through Dense Correspondences.☆168Updated 6 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆121Updated 2 weeks ago
- LLaRA: Large Language and Robotics Assistant☆153Updated last month
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"☆223Updated 2 weeks ago
- [CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'☆121Updated 4 months ago
- [arXiv 2023] Embodied Task Planning with Large Language Models☆155Updated last year
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆199Updated last month
- ☆58Updated last month
- ☆73Updated this week
- ☆83Updated last year
- This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and fol…☆121Updated 2 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆264Updated this week
- [ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model☆341Updated last week
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)☆127Updated 2 months ago
- Theia: Distilling Diverse Vision Foundation Models for Robot Learning☆160Updated last month
- Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆255Updated 2 months ago
- Embodied Chain of Thought: A robotic policy that reason to solve the task.☆87Updated 2 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆132Updated 3 weeks ago
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"☆271Updated 9 months ago
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model☆333Updated 4 months ago
- ☆113Updated 4 months ago
- A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instruc…☆48Updated this week
- ☆64Updated 4 months ago
- ☆145Updated 3 weeks ago
- Code for "Interactive Task Planning with Language Models"☆25Updated last year