VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β343Updated 4 months ago
Alternatives and similar repositories for VIRL:
Users that are interested in VIRL are comparing it to the libraries listed below
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β285Updated 11 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ205Updated 2 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ434Updated 4 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ311Updated last year
- Open Platform for Embodied Agentsβ308Updated 3 months ago
- Official repo and evaluation implementation of VSI-Benchβ463Updated last month
- β94Updated 2 weeks ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ141Updated 7 months ago
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.β348Updated 2 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ263Updated last month
- Long Context Transfer from Language to Visionβ372Updated last month
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ502Updated 7 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ339Updated 4 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ262Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simuβ¦β87Updated 2 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β104Updated 3 weeks ago
- Compose multimodal datasets πΉβ340Updated 2 weeks ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Modelsβ182Updated 2 weeks ago
- β610Updated last year
- LLaVA-Interactive-Demoβ368Updated 8 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ299Updated this week
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"β131Updated 5 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ192Updated 9 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ320Updated last month
- β182Updated 9 months ago
- Paper collections of the continuous effort start from World Models.β170Updated 9 months ago
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long cβ¦β286Updated 2 weeks ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β420Updated last week
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ272Updated 7 months ago