VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β334Updated 2 months ago
Alternatives and similar repositories for VIRL:
Users that are interested in VIRL are comparing it to the libraries listed below
- πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β283Updated 9 months ago
- Official repo and evaluation implementation of VSI-Benchβ388Updated 3 weeks ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ418Updated 2 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ306Updated 10 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ134Updated 5 months ago
- Compose multimodal datasets πΉβ283Updated 2 weeks ago
- β599Updated last year
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ229Updated 3 weeks ago
- Long Context Transfer from Language to Visionβ360Updated 3 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ256Updated 5 months ago
- β160Updated 7 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ272Updated 10 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β391Updated last month
- Towards Large Multimodal Models as Visual Foundation Agentsβ185Updated 2 weeks ago
- β112Updated last month
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ498Updated 4 months ago
- Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β307Updated 10 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,β¦β117Updated this week
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ284Updated 2 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ193Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β328Updated last month
- Open Platform for Embodied Agentsβ293Updated last month
- Code for "Learning to Model the World with Language." ICML 2024 Oral.β378Updated last year
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ214Updated 2 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]β199Updated this week
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β244Updated 3 months ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β200Updated last year
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/β238Updated last year
- β308Updated last year