JiuTian-VL / Optimus-1
[NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
☆70Updated 3 weeks ago
Alternatives and similar repositories for Optimus-1:
Users that are interested in Optimus-1 are comparing it to the libraries listed below
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆118Updated last week
- ☆84Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆86Updated 2 months ago
- ☆54Updated this week
- Paper List of Minecraft Agents☆26Updated 3 weeks ago
- ☆37Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆195Updated last month
- (ICLR 2025) The Official Code Repository for GUI-World.☆53Updated 3 months ago
- [CVPR2025] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆177Updated 2 weeks ago
- Code for NeurIPS 2024 paper "AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning"☆39Updated 4 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆98Updated last month
- ☆55Updated 3 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated last week
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆53Updated last week
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆89Updated 5 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆195Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆131Updated 4 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoning☆73Updated 2 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆70Updated 4 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated 2 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆106Updated 8 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆53Updated 2 months ago
- [CVPR2024] This is the official implement of MP5☆99Updated 9 months ago
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"☆57Updated last week
- ☆104Updated 2 months ago
- ☆44Updated last year
- ☆55Updated last month
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆180Updated 5 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization☆43Updated 3 weeks ago