VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β360Updated 10 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β292Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ459Updated 10 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ523Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ133Updated this week
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ334Updated 6 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ316Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ319Updated last year
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ144Updated last year
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β95Updated 3 months ago
- Official repo and evaluation implementation of VSI-Benchβ604Updated 2 months ago
- Compose multimodal datasets πΉβ485Updated 2 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]β77Updated 3 months ago
- β628Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environmentβ27Updated last year
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ263Updated 2 months ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β213Updated 2 years ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)β627Updated 2 weeks ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ277Updated last year
- Open Platform for Embodied Agentsβ329Updated 8 months ago
- Long Context Transfer from Language to Visionβ394Updated 6 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ391Updated 5 months ago
- [CVPR2024] This is the official implement of MP5β104Updated last year
- [ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β315Updated last year
- β88Updated 2 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ199Updated last year
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents frameworkβ78Updated 3 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)β128Updated 4 months ago
- β142Updated 9 months ago
- Paper collections of the continuous effort start from World Models.β185Updated last year
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)β45Updated 5 months ago