Benchmark environment for evaluating vision-language models (VLMs) on popular video games!
☆342May 30, 2025Updated 10 months ago
Alternatives and similar repositories for videogamebench
Users that are interested in videogamebench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆33May 31, 2025Updated 10 months ago
- Benchmarking Agentic LLM and VLM Reasoning On Games☆243Updated this week
- [ICLR 2026] LLM/VLM gaming agents and model evaluation through games.☆913Nov 16, 2025Updated 4 months ago
- speed-running solving robot manipulation tasks☆24Oct 31, 2024Updated last year
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆102May 20, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆67Feb 4, 2026Updated 2 months ago
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆166Nov 16, 2025Updated 4 months ago
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Jun 4, 2025Updated 10 months ago
- A Bimanual-mobile Robot Manipulation Dataset specifically designed for household applications☆16Aug 12, 2024Updated last year
- ☆11Nov 18, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards☆1,391Mar 28, 2026Updated 2 weeks ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆594Aug 10, 2025Updated 8 months ago
- diffusers with search engine☆12Jan 13, 2026Updated 3 months ago
- We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effe…☆25Feb 10, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆31Jun 25, 2024Updated last year
- Selected list of papers on World Models that I found interesting and/or useful.☆37Feb 8, 2025Updated last year
- Kinematic and dynamic models of continuum and articulated soft robots.☆16Nov 22, 2025Updated 4 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Sep 14, 2025Updated 7 months ago
- Repo for Paper "OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft"☆29Apr 2, 2026Updated last week
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆61Dec 18, 2025Updated 3 months ago
- ☆18Mar 28, 2023Updated 3 years ago
- Agentic RL Training at Scale☆1,292Updated this week
- A simple visual test-time scaling method for GUI agent grounding☆23Dec 7, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆46Mar 31, 2025Updated last year
- [NeurIPS 2025] Frame In-N-Out: Unbounded Controllable Image-to-Video Generation☆31Jan 5, 2026Updated 3 months ago
- Code for Scalable Offline Model-Based RL with Action chunking☆21Feb 20, 2026Updated last month
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- GROOT: Learning to Follow Instructions by Watching Gameplay Videos (ICLR'24, Spotlight)☆67Dec 18, 2023Updated 2 years ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Apr 22, 2025Updated 11 months ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆69Sep 5, 2025Updated 7 months ago
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆851Feb 11, 2026Updated 2 months ago
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 10 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆2,011Dec 6, 2024Updated last year
- GUI Grounding for Professional High-Resolution Computer Use☆357Mar 4, 2026Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆246May 5, 2024Updated last year
- ☆34Jan 4, 2026Updated 3 months ago
- ☆111Dec 10, 2025Updated 4 months ago
- Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"☆94May 23, 2023Updated 2 years ago