cnsdqd-dyb / VillagerAgent-Minecraft-multiagent-framework
A minecraft multi agents framework
☆34Updated last week
Related projects ⓘ
Alternatives and complementary repositories for VillagerAgent-Minecraft-multiagent-framework
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆55Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago
- Official implement of MIA-DPO☆32Updated last week
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆122Updated 2 weeks ago
- Official repo for StableLLAVA☆90Updated 10 months ago
- ☆61Updated last week
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆129Updated 2 weeks ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆41Updated 2 weeks ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆38Updated 3 months ago
- [CVPR2024] This is the official implement of MP5☆83Updated 4 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆96Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆69Updated 2 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆104Updated 3 weeks ago
- ☆27Updated last week
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environment☆17Updated 7 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆35Updated 3 weeks ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆82Updated 4 months ago
- This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Co…☆71Updated 4 months ago
- Official implementation of "Self-Improving Video Generation"☆49Updated this week
- [NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆21Updated 3 weeks ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆42Updated last week
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆29Updated last month
- ⛏💎 STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆28Updated 10 months ago
- ☆59Updated last month
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆57Updated last month
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆20Updated last month
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆48Updated last month