JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
☆389Apr 8, 2024Updated last year
Alternatives and similar repositories for JARVIS-1
Users that are interested in JARVIS-1 are comparing it to the libraries listed below
Sorting:
- GROOT: Learning to Follow Instructions by Watching Gameplay Videos (ICLR'24, Spotlight)☆67Dec 18, 2023Updated 2 years ago
- Text world based on Minecraft rules.☆17May 13, 2024Updated last year
- Implementation of "Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction"☆46Aug 15, 2023Updated 2 years ago
- STEVE-1: A Generative Model for Text-to-Behavior in Minecraft☆204Jun 4, 2024Updated last year
- Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…☆290Aug 3, 2023Updated 2 years ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…☆104Jun 16, 2025Updated 8 months ago
- ☆30Jun 25, 2024Updated last year
- [CVPR2024] This is the official implement of MP5☆108Jun 30, 2024Updated last year
- An Open-Ended Embodied Agent with Large Language Models☆6,702Apr 3, 2024Updated last year
- Foundation Model for MineDojo☆296Apr 2, 2023Updated 2 years ago
- ☆87Dec 15, 2023Updated 2 years ago
- MineStudio: A Streamlined Package for Minecraft AI Agent Development☆348Feb 7, 2026Updated 3 weeks ago
- Paper List of Minecraft Agents☆55Aug 15, 2025Updated 6 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Apr 13, 2025Updated 10 months ago
- We introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, le…☆27Apr 7, 2025Updated 10 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆95Jun 17, 2025Updated 8 months ago
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆35Feb 10, 2024Updated 2 years ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆763Feb 1, 2024Updated 2 years ago
- Code for "Learning to Model the World with Language." ICML 2024 Oral.☆413Jan 7, 2026Updated last month
- Official Implementation of Paper "ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment" (AAAI'26)☆41Jul 2, 2025Updated 8 months ago
- ☆99Jun 12, 2024Updated last year
- ☆47Dec 11, 2023Updated 2 years ago
- Building Open-Ended Embodied Agents with Internet-Scale Knowledge☆2,152Mar 18, 2024Updated last year
- GPT-4V in Wonderland: LMMs as Smartphone Agents☆135Jul 17, 2024Updated last year
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Jun 4, 2025Updated 9 months ago
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"☆134Aug 27, 2025Updated 6 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,753Sep 9, 2024Updated last year
- Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning☆11Jul 20, 2022Updated 3 years ago
- Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memo…☆637Jun 5, 2023Updated 2 years ago
- Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs☆106Sep 30, 2025Updated 5 months ago
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos☆1,644Sep 3, 2025Updated 6 months ago
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆267Jun 28, 2024Updated last year
- Chat²GPT is a ChatGPT (and DALL·E 2/3, and ElevenLabs) chat bot for Google Chat. 🤖💬☆11Feb 2, 2026Updated last month
- AI Search engine☆13Sep 24, 2025Updated 5 months ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆555Oct 28, 2023Updated 2 years ago
- ☆37Oct 21, 2025Updated 4 months ago
- [COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild☆4,716Nov 18, 2024Updated last year
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆25Oct 14, 2024Updated last year
- [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆295May 20, 2024Updated last year