VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β321Updated last month
Alternatives and similar repositories for VIRL:
Users that are interested in VIRL are comparing it to the libraries listed below
- πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β278Updated 7 months ago
- Official repo and evaluation implementation of VSI-Benchβ326Updated this week
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ409Updated last month
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ306Updated 9 months ago
- Compose multimodal datasets πΉβ261Updated last month
- Long Context Transfer from Language to Visionβ356Updated last month
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ199Updated this week
- β588Updated 11 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ132Updated 4 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ272Updated 9 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ250Updated 3 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ249Updated last month
- LLaVA-Interactive-Demoβ360Updated 5 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ492Updated 3 months ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β197Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).β596Updated 3 months ago
- Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β305Updated 9 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β324Updated this week
- Open Platform for Embodied Agentsβ280Updated this week
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ105Updated 8 months ago
- Official repo for LayoutGPTβ314Updated 9 months ago
- β159Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β112Updated 6 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ124Updated 2 months ago
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ195Updated last month
- β99Updated last week
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β278Updated 2 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β161Updated last month
- β338Updated 2 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β378Updated this week