VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β364Updated 11 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β292Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Updated 11 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ134Updated last month
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ318Updated last year
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ533Updated last year
- Official repo and evaluation implementation of VSI-Benchβ631Updated 3 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ279Updated last year
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ145Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ330Updated last year
- β632Updated last year
- Compose multimodal datasets πΉβ507Updated 3 months ago
- Long Context Transfer from Language to Visionβ397Updated 8 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ345Updated 8 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ265Updated 3 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]β77Updated 4 months ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β97Updated 5 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)β655Updated last month
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ406Updated 6 months ago
- A paper list for spatial reasoningβ222Updated this week
- Open Platform for Embodied Agentsβ333Updated 10 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ247Updated last year
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents frameworkβ81Updated 5 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ310Updated 6 months ago
- [ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.β318Updated last month
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ199Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β149Updated last month
- β104Updated 4 months ago
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environmentβ26Updated last year
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"β365Updated 2 months ago
- Visual Planning: Let's Think Only with Imagesβ280Updated 6 months ago