VIRL-Platform / VIRLLinks

(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life

☆360

Alternatives and similar repositories for VIRL

Users that are interested in VIRL are comparing it to the libraries listed below

Sorting:

dongyh20 / Octopus
[ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
☆292Updated last year
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆459Updated 10 months ago
maitrix-org / Pandora
Pandora: Towards General World Model with Natural Language Actions and Video States
☆523Updated last year
Chenyu-Wang567 / MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
☆133Updated this week
EvolvingLMMs-Lab / EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆334Updated 6 months ago
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆316Updated last year
facebookresearch / open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
☆319Updated last year
zzxslp / SoM-LLaVA
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆144Updated last year
Zhoues / MineDreamer
[IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…
☆95Updated 3 months ago
vision-x-nyu / thinking-in-space
Official repo and evaluation implementation of VSI-Bench
☆604Updated 2 months ago
remyxai / VQASynth
Compose multimodal datasets 🎹
☆485Updated 2 months ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 3 months ago
allenai / unified-io-2
☆628Updated last year
fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆27Updated last year
Yushi-Hu / VisualSketchpad
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
☆263Updated 2 months ago
3d-vista / 3D-VisTA
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
☆213Updated 2 years ago
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆627Updated 2 weeks ago
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆277Updated last year
thunlp / LEGENT
Open Platform for Embodied Agents
☆329Updated 8 months ago
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆394Updated 6 months ago
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆391Updated 5 months ago
IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆104Updated last year
sled-group / chat-with-nerf
[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
☆315Updated last year
Gabesarch / grounded-rl
☆88Updated 2 months ago
JeffWang987 / WorldDreamer
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
☆199Updated last year
cnsdqd-dyb / VillagerAgent-Minecraft-multiagent-framework
(VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework
☆78Updated 3 months ago
para-lost / AutoPresent
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)
☆128Updated 4 months ago
phyworld / phyworld
☆142Updated 9 months ago
Timothyxxx / WorldModelPapers
Paper collections of the continuous effort start from World Models.
☆185Updated last year
CraftJarvis / ROCKET-1
Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)
☆45Updated 5 months ago