VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β362Updated 11 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β292Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Updated 11 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ134Updated 3 weeks ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ528Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ327Updated last year
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β95Updated 4 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ144Updated last year
- β628Updated last year
- Official repo and evaluation implementation of VSI-Benchβ613Updated 2 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ277Updated last year
- Open Platform for Embodied Agentsβ331Updated 9 months ago
- Compose multimodal datasets πΉβ497Updated 2 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ341Updated 7 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]β77Updated 4 months ago
- Long Context Transfer from Language to Visionβ394Updated 7 months ago
- β99Updated 3 months ago
- [NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasonβ¦β143Updated last month
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ403Updated 6 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ356Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environmentβ26Updated last year
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ263Updated 2 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β352Updated 9 months ago
- β30Updated last year
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]β231Updated 7 months ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β213Updated 2 years ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ193Updated last year
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)β132Updated 5 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ198Updated last year
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environmentβ39Updated last year