VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β355Updated 6 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β289Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ448Updated 6 months ago
- Official repo and evaluation implementation of VSI-Benchβ522Updated 3 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ315Updated last year
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ295Updated 3 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ143Updated 10 months ago
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ506Updated 9 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ352Updated last year
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ353Updated 2 months ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β91Updated last week
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ291Updated 9 months ago
- Long Context Transfer from Language to Visionβ382Updated 3 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ229Updated last year
- Towards Large Multimodal Models as Visual Foundation Agentsβ216Updated 2 months ago
- Open Platform for Embodied Agentsβ321Updated 5 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β215Updated 6 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β343Updated 5 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ200Updated last year
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β133Updated last month
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β211Updated 6 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]β218Updated 3 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ281Updated last month
- β128Updated 5 months ago
- Code for "Learning to Model the World with Language." ICML 2024 Oral.β388Updated last year
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ230Updated 7 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ192Updated 11 months ago
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β259Updated 2 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ130Updated 8 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]β68Updated 2 months ago