VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β315Updated 4 months ago
Related projects β
Alternatives and complementary repositories for VIRL
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ396Updated 7 months ago
- πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β264Updated 6 months ago
- Compose multimodal datasets πΉβ211Updated this week
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ301Updated 7 months ago
- Open Platform for Embodied Agentsβ271Updated last month
- β573Updated 9 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ236Updated 2 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β359Updated this week
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ141Updated 3 weeks ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ124Updated 2 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β582Updated 2 months ago
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024β261Updated 7 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ478Updated last month
- Long Context Transfer from Language to Visionβ334Updated 3 weeks ago
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.β263Updated last month
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β190Updated last year
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ204Updated last month
- β152Updated 4 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ186Updated 4 months ago
- LLaVA-Interactive-Demoβ352Updated 3 months ago
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β225Updated 3 weeks ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ122Updated 3 weeks ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ193Updated 10 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β315Updated 4 months ago
- Explore the Limits of Omni-modal Pretraining at Scaleβ89Updated 2 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β131Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ122Updated last week
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ334Updated 11 months ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageβ590Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"β115Updated last week