[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β298May 20, 2024Updated last year
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipelineβ14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognitionβ26Jul 25, 2023Updated 2 years ago
- BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFExβ1,412Updated this week
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,191Jun 6, 2024Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23β103Apr 30, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.β110Dec 8, 2023Updated 2 years ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"β106Nov 9, 2023Updated 2 years ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ660Jun 13, 2025Updated 10 months ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ481Apr 20, 2025Updated 11 months ago
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.β461Jul 4, 2023Updated 2 years ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β104Dec 25, 2025Updated 3 months ago
- Official implementation of GROOT, CoRL 2023β68Nov 4, 2023Updated 2 years ago
- β645Feb 15, 2024Updated 2 years ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ195Aug 22, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)β118Mar 13, 2025Updated last year
- A local AI assistant running on your device. It turns your files into actionable memory.β55Mar 24, 2026Updated 3 weeks ago
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,524Aug 19, 2024Updated last year
- [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generationβ524Sep 2, 2024Updated last year
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ374Jun 23, 2024Updated last year
- [IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Modelsβ951Nov 13, 2024Updated last year
- [TPAMI 2024] PERF: Panoramic Neural Radiance Field from a Single Panoramaβ244Apr 14, 2024Updated 2 years ago
- Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22β472Apr 10, 2023Updated 3 years ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ409Mar 19, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β99May 8, 2025Updated 11 months ago
- β120Jun 11, 2024Updated last year
- A generative and self-guided robotic agent that endlessly propose and master new skills.β1,163May 31, 2024Updated last year
- Reading list for research topics in embodied visionβ703Jun 13, 2025Updated 10 months ago
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Imagesβ58Dec 6, 2021Updated 4 years ago
- [ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"β497Oct 14, 2023Updated 2 years ago
- β28Nov 6, 2023Updated 2 years ago
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Imagesβ504Oct 7, 2025Updated 6 months ago
- Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigationβ82May 31, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Table top manipulation calibration between the robot arm, the fixed cameras and the camera in hand.β11Apr 12, 2024Updated 2 years ago
- [NeurIPS 2023] InsActor: Instruction-driven Physics-based Charactersβ139Feb 12, 2026Updated 2 months ago
- Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"β847Apr 18, 2024Updated last year
- An open-source framework for training large multimodal models.β4,084Aug 31, 2024Updated last year
- Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)β3,141May 3, 2024Updated last year
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMMβ20May 22, 2025Updated 10 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.β235Dec 22, 2023Updated 2 years ago