VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β352Updated 6 months ago
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β289Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ443Updated 6 months ago
- β613Updated last year
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ504Updated 8 months ago
- Official repo and evaluation implementation of VSI-Benchβ492Updated 3 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ314Updated last year
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β128Updated last month
- Towards Large Multimodal Models as Visual Foundation Agentsβ216Updated last month
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ285Updated 8 months ago
- Long Context Transfer from Language to Visionβ378Updated 2 months ago
- Compose multimodal datasets πΉβ393Updated this week
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ280Updated last month
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ142Updated 9 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β612Updated 8 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β342Updated 4 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ289Updated 2 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β438Updated 2 weeks ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuningβ178Updated last month
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ222Updated 7 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ330Updated last month
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]β67Updated last month
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ125Updated last year
- ControlLLM: Augment Language Models with Tools by Searching on Graphsβ191Updated 10 months ago
- β129Updated 5 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ330Updated 3 months ago
- Open Platform for Embodied Agentsβ318Updated 4 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]β215Updated 2 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β252Updated last month
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β209Updated last year