VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β341Updated 3 months ago
Alternatives and similar repositories for VIRL:
Users that are interested in VIRL are comparing it to the libraries listed below
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ427Updated 3 months ago
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β285Updated 10 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ308Updated 11 months ago
- Official repo and evaluation implementation of VSI-Benchβ423Updated last month
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ276Updated 11 months ago
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long cβ¦β218Updated this week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ249Updated 2 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ502Updated 6 months ago
- Compose multimodal datasets πΉβ321Updated last week
- Open Platform for Embodied Agentsβ301Updated 2 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]β60Updated this week
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ140Updated 7 months ago
- β603Updated last year
- Long Context Transfer from Language to Visionβ368Updated last week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β81Updated this week
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ328Updated 3 months ago
- β122Updated 2 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β404Updated 3 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ195Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simuβ¦β86Updated 2 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ244Updated last week
- β84Updated last month
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β333Updated 2 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.β53Updated last week
- Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).β143Updated 7 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ255Updated last month
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ314Updated 3 weeks ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"β131Updated 4 months ago
- β166Updated 8 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ191Updated 8 months ago