VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β360Updated 9 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β293Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ457Updated 9 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ132Updated last year
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ517Updated 11 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β95Updated 3 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ144Updated last year
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- Official repo and evaluation implementation of VSI-Benchβ598Updated last month
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ329Updated 6 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)β612Updated this week
- Compose multimodal datasets πΉβ478Updated last month
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ316Updated last year
- β626Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environmentβ26Updated last year
- Open Platform for Embodied Agentsβ329Updated 8 months ago
- β84Updated last month
- Visual Planning: Let's Think Only with Imagesβ271Updated 4 months ago
- Long Context Transfer from Language to Visionβ393Updated 6 months ago
- [CVPR2024] This is the official implement of MP5β103Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]β77Updated 2 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents frameworkβ76Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β189Updated 4 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)β124Updated 3 months ago
- The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".β138Updated last week
- β142Updated 8 months ago
- [ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β315Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β256Updated 9 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)β44Updated 5 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β494Updated 4 months ago