[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β295May 20, 2024Updated last year
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below
Sorting:
- Benchmarking and Analyzing Generative Data for Visual Recognitionβ26Jul 25, 2023Updated 2 years ago
- Syphus: Automatic Instruction-Response Generation Pipelineβ14Dec 14, 2023Updated 2 years ago
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,181Jun 6, 2024Updated last year
- BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFExβ1,344Updated this week
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"β105Nov 9, 2023Updated 2 years ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ477Apr 20, 2025Updated 10 months ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23β102Apr 30, 2024Updated last year
- Official implementation of GROOT, CoRL 2023β67Nov 4, 2023Updated 2 years ago
- β643Feb 15, 2024Updated 2 years ago
- β120Jun 11, 2024Updated last year
- [IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Modelsβ948Nov 13, 2024Updated last year
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ652Jun 13, 2025Updated 8 months ago
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.β462Jul 4, 2023Updated 2 years ago
- β28Nov 6, 2023Updated 2 years ago
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,517Aug 19, 2024Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agentsβ48Feb 27, 2025Updated last year
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.β110Dec 8, 2023Updated 2 years ago
- Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigationβ80May 31, 2023Updated 2 years ago
- [TPAMI 2024] PERF: Panoramic Neural Radiance Field from a Single Panoramaβ245Apr 14, 2024Updated last year
- [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generationβ516Sep 2, 2024Updated last year
- Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)β3,117May 3, 2024Updated last year
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ201Jan 19, 2024Updated 2 years ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ193Aug 22, 2023Updated 2 years ago
- Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22β469Apr 10, 2023Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β864May 8, 2025Updated 9 months ago
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ373Jun 23, 2024Updated last year
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)β118Mar 13, 2025Updated 11 months ago
- Instruction Following Agents with Multimodal Transforemrsβ53Nov 3, 2022Updated 3 years ago
- [ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"β498Oct 14, 2023Updated 2 years ago
- A generative and self-guided robotic agent that endlessly propose and master new skills.β1,150May 31, 2024Updated last year
- HACMan++ code release. RSS 2024.β22Dec 23, 2024Updated last year
- A local AI assistant running on your device. It turns your files into actionable memory.β54Feb 15, 2026Updated 2 weeks ago
- β16Apr 23, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,338Mar 5, 2024Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.β233Dec 22, 2023Updated 2 years ago
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paperβ167May 7, 2024Updated last year
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Imagesβ506Oct 7, 2025Updated 4 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ399Mar 19, 2025Updated 11 months ago
- Code for "Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos"β28Oct 25, 2021Updated 4 years ago