VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β313Updated 4 months ago
Related projects β
Alternatives and complementary repositories for VIRL
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ391Updated 6 months ago
- πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β263Updated 5 months ago
- Compose multimodal datasets πΉβ204Updated this week
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ300Updated 6 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ233Updated last month
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ123Updated 2 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ477Updated last month
- Open Platform for Embodied Agentsβ263Updated 3 weeks ago
- Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β303Updated 6 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β127Updated 2 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ121Updated 2 weeks ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ120Updated 2 weeks ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β189Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β317Updated 3 months ago
- β287Updated 9 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ199Updated last month
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024β261Updated 6 months ago
- β152Updated 4 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β353Updated 3 weeks ago
- β569Updated 8 months ago
- β258Updated this week
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/β229Updated last year
- β282Updated 6 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β264Updated this week
- Long Context Transfer from Language to Visionβ328Updated 2 weeks ago
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β223Updated 2 weeks ago
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ162Updated 3 weeks ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ193Updated 9 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ186Updated 3 months ago
- β26Updated this week