dongyh20 / OctopusLinks
[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β294Updated last year
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below
Sorting:
- (ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Lifeβ367Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ340Updated last year
- Open Platform for Embodied Agentsβ339Updated last year
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β293Updated 10 months ago
- β46Updated 2 years ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β102Updated 7 months ago
- [CVPR2024] This is the official implement of MP5β106Updated last year
- Code for "Learning to Model the World with Language." ICML 2024 Oral.β413Updated last month
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ133Updated last year
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ373Updated last year
- β133Updated last year
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ475Updated 9 months ago
- β99Updated last year
- β118Updated 10 months ago
- [ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Modelsβ214Updated 10 months ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ193Updated 2 years ago
- GROOT: Learning to Follow Instructions by Watching Gameplay Videos (ICLR'24, Spotlight)β67Updated 2 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chainβ106Updated last year
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β335Updated 2 years ago
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"β127Updated 5 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ458Updated last year
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasksβ94Updated 7 months ago
- GPT-4V in Wonderland: LMMs as Smartphone Agentsβ135Updated last year
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.β262Updated 3 months ago
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)β278Updated 11 months ago
- Evaluate Multimodal LLMs as Embodied Agentsβ57Updated 11 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ256Updated 9 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ533Updated last year
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ405Updated last year
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasksβ186Updated 4 months ago