kyegomez / SIMALinks
Pytorch Implementation of Deepmind's SIMA: "Scaling Instructable Agents Across Many Simulated Worlds"
☆27Updated last year
Alternatives and similar repositories for SIMA
Users that are interested in SIMA are comparing it to the libraries listed below
Sorting:
- GROOT: Learning to Follow Instructions by Watching Gameplay Videos (ICLR'24, Spotlight)☆66Updated 2 years ago
- Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"☆42Updated last year
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"☆115Updated 4 months ago
- Implementation of the premier Text to Video model from OpenAI☆56Updated last year
- ☆46Updated 2 years ago
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficien…☆26Updated 2 months ago
- Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"☆94Updated 2 years ago
- ☆34Updated 2 years ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆12Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 10 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆91Updated 6 months ago
- ☆28Updated 2 years ago
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆34Updated last year
- ☆30Updated last year
- Computer-Use Agents as Judges for Generative UI☆39Updated last month
- A Data Source for Reasoning Embodied Agents☆19Updated 2 years ago
- ☆98Updated last year
- Enhancement in Multimodal Representation Learning.☆41Updated last year
- ☆118Updated 9 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆48Updated 6 months ago
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆27Updated last year
- 😊 TPTT: Transforming Pretrained Transformers into Titans☆49Updated last month
- [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆293Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆39Updated last year
- ☆19Updated 11 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆46Updated 6 months ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆53Updated 2 years ago
- ☆56Updated last year
- ☆20Updated last year