[ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
β300May 20, 2024Updated 2 years ago
Alternatives and similar repositories for Octopus
Users that are interested in Octopus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipelineβ14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognitionβ26Jul 25, 2023Updated 2 years ago
- BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFExβ1,469Updated this week
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,198Jun 6, 2024Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23β103Apr 30, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.β110Dec 8, 2023Updated 2 years ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"β106Nov 9, 2023Updated 2 years ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ665Jun 13, 2025Updated 11 months ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ484Apr 20, 2025Updated last year
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.β465Jul 4, 2023Updated 2 years ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β104Dec 25, 2025Updated 5 months ago
- Official implementation of GROOT, CoRL 2023β71Nov 4, 2023Updated 2 years ago
- β648Feb 15, 2024Updated 2 years ago
- [arXiv 2023] Embodied Task Planning with Large Language Modelsβ194Aug 22, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)β121Mar 13, 2025Updated last year
- A local AI assistant running on your device. It turns your files into actionable memory.β55Mar 24, 2026Updated 2 months ago
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,533Aug 19, 2024Updated last year
- [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generationβ527Sep 2, 2024Updated last year
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ375Jun 23, 2024Updated last year
- [IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Modelsβ953Nov 13, 2024Updated last year
- [TPAMI 2024] PERF: Panoramic Neural Radiance Field from a Single Panoramaβ244Apr 14, 2024Updated 2 years ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ427Mar 19, 2025Updated last year
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β100May 8, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- β121Jun 11, 2024Updated last year
- A generative and self-guided robotic agent that endlessly propose and master new skills.β1,191May 31, 2024Updated last year
- Reading list for research topics in embodied visionβ704Jun 13, 2025Updated 11 months ago
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Imagesβ58Dec 6, 2021Updated 4 years ago
- [ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"β496Oct 14, 2023Updated 2 years ago
- β28Nov 6, 2023Updated 2 years ago
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Imagesβ504Oct 7, 2025Updated 7 months ago
- Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigationβ82May 31, 2023Updated 2 years ago
- Table top manipulation calibration between the robot arm, the fixed cameras and the camera in hand.β12Apr 12, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [NeurIPS 2023] InsActor: Instruction-driven Physics-based Charactersβ139Feb 12, 2026Updated 3 months ago
- Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"β852Apr 18, 2024Updated 2 years ago
- An open-source framework for training large multimodal models.β4,102Aug 31, 2024Updated last year
- Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)β3,161May 3, 2024Updated 2 years ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMMβ20May 22, 2025Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.β236Dec 22, 2023Updated 2 years ago
- Long Context Transfer from Language to Visionβ403Mar 18, 2025Updated last year