VIRL-Platform / VIRLLinks
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
β367Updated last year
Alternatives and similar repositories for VIRL
Users that are interested in VIRL are comparing it to the libraries listed below
Sorting:
- [ECCV2024] πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β294Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ458Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ138Updated 3 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ339Updated last year
- Pandora: Towards General World Model with Natural Language Actions and Video Statesβ533Updated last year
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ145Updated last year
- Official repo and evaluation implementation of VSI-Benchβ667Updated 5 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ280Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environmentβ26Updated last year
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulaβ¦β100Updated 7 months ago
- β642Updated last year
- Long Context Transfer from Language to Visionβ398Updated 10 months ago
- Open Platform for Embodied Agentsβ337Updated last year
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ378Updated 10 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)β687Updated 4 months ago
- [CVPR2024] This is the official implement of MP5β106Updated last year
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]β237Updated 3 weeks ago
- Compose multimodal datasets πΉβ542Updated 3 weeks ago
- β114Updated 6 months ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planningβ79Updated last year
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents frameworkβ83Updated 7 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)β46Updated 9 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingβ315Updated 9 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β637Updated last year
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β217Updated 2 years ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β289Updated last year
- β218Updated last year
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ277Updated 5 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)β151Updated 8 months ago