[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β299May 20, 2024Updated last year
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipelineβ14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognitionβ26Jul 25, 2023Updated 2 years ago
- BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFExβ1,446Updated this week
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,192Jun 6, 2024Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23β103Apr 30, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.β110Dec 8, 2023Updated 2 years ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"β106Nov 9, 2023Updated 2 years ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ662Jun 13, 2025Updated 10 months ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ483Apr 20, 2025Updated last year
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.β463Jul 4, 2023Updated 2 years ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β104Dec 25, 2025Updated 4 months ago
- Official implementation of GROOT, CoRL 2023β70Nov 4, 2023Updated 2 years ago
- β647Feb 15, 2024Updated 2 years ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ195Aug 22, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)β120Mar 13, 2025Updated last year
- A local AI assistant running on your device. It turns your files into actionable memory.β55Mar 24, 2026Updated last month
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,528Aug 19, 2024Updated last year
- [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generationβ525Sep 2, 2024Updated last year
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ374Jun 23, 2024Updated last year
- [IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Modelsβ951Nov 13, 2024Updated last year
- [TPAMI 2024] PERF: Panoramic Neural Radiance Field from a Single Panoramaβ244Apr 14, 2024Updated 2 years ago
- Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22β474Apr 10, 2023Updated 3 years ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ419Mar 19, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β100May 8, 2025Updated 11 months ago
- β121Jun 11, 2024Updated last year
- Reading list for research topics in embodied visionβ704Jun 13, 2025Updated 10 months ago
- A generative and self-guided robotic agent that endlessly propose and master new skills.β1,169May 31, 2024Updated last year
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Imagesβ58Dec 6, 2021Updated 4 years ago
- [ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"β497Oct 14, 2023Updated 2 years ago
- β28Nov 6, 2023Updated 2 years ago
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Imagesβ504Oct 7, 2025Updated 7 months ago
- Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigationβ82May 31, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Table top manipulation calibration between the robot arm, the fixed cameras and the camera in hand.β11Apr 12, 2024Updated 2 years ago
- [NeurIPS 2023] InsActor: Instruction-driven Physics-based Charactersβ139Feb 12, 2026Updated 2 months ago
- Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"β847Apr 18, 2024Updated 2 years ago
- An open-source framework for training large multimodal models.β4,088Aug 31, 2024Updated last year
- Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)β3,151May 3, 2024Updated 2 years ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMMβ20May 22, 2025Updated 11 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.β235Dec 22, 2023Updated 2 years ago