[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β297May 20, 2024Updated last year
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipelineβ14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognitionβ26Jul 25, 2023Updated 2 years ago
- BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFExβ1,377Updated this week
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,186Jun 6, 2024Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23β103Apr 30, 2024Updated last year
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.β110Dec 8, 2023Updated 2 years ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"β105Nov 9, 2023Updated 2 years ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ658Jun 13, 2025Updated 9 months ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ478Apr 20, 2025Updated 11 months ago
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.β461Jul 4, 2023Updated 2 years ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β104Dec 25, 2025Updated 3 months ago
- Official implementation of GROOT, CoRL 2023β68Nov 4, 2023Updated 2 years ago
- β644Feb 15, 2024Updated 2 years ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ193Aug 22, 2023Updated 2 years ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)β118Mar 13, 2025Updated last year
- A local AI assistant running on your device. It turns your files into actionable memory.β55Updated this week
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,519Aug 19, 2024Updated last year
- [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generationβ519Sep 2, 2024Updated last year
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ373Jun 23, 2024Updated last year
- [IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Modelsβ951Nov 13, 2024Updated last year
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ405Mar 19, 2025Updated last year
- Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22β471Apr 10, 2023Updated 2 years ago
- [TPAMI 2024] PERF: Panoramic Neural Radiance Field from a Single Panoramaβ244Apr 14, 2024Updated last year
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β99May 8, 2025Updated 10 months ago
- β120Jun 11, 2024Updated last year
- A generative and self-guided robotic agent that endlessly propose and master new skills.β1,154May 31, 2024Updated last year
- Reading list for research topics in embodied visionβ702Jun 13, 2025Updated 9 months ago
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Imagesβ58Dec 6, 2021Updated 4 years ago
- [ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"β497Oct 14, 2023Updated 2 years ago
- β28Nov 6, 2023Updated 2 years ago
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Imagesβ504Oct 7, 2025Updated 5 months ago
- Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigationβ80May 31, 2023Updated 2 years ago
- Table top manipulation calibration between the robot arm, the fixed cameras and the camera in hand.β11Apr 12, 2024Updated last year
- [NeurIPS 2023] InsActor: Instruction-driven Physics-based Charactersβ141Feb 12, 2026Updated last month
- Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"β846Apr 18, 2024Updated last year
- An open-source framework for training large multimodal models.β4,079Aug 31, 2024Updated last year
- Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)β3,130May 3, 2024Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.β234Dec 22, 2023Updated 2 years ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMMβ20May 22, 2025Updated 10 months ago