alexzhang13 / videogamebenchLinks

Benchmark environment for evaluating vision-language models (VLMs) on popular video games!

☆277

Alternatives and similar repositories for videogamebench

Users that are interested in videogamebench are comparing it to the libraries listed below

Sorting:

casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 5 months ago
SakanaAI / RLT
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
☆122Updated this week
ChenxinAn-fdu / POLARIS
Scaling RL on advanced reasoning models
☆308Updated this week
NousResearch / atropos
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …
☆514Updated this week
groundlight / r1_vlm
Build your own visual reasoning model
☆385Updated last week
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆156Updated last month
menloresearch / visual-thinker
☆158Updated last month
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆109Updated 2 months ago
cassidylaidlaw / minecraft-building-assistance-game
☆145Updated 2 months ago
facebookresearch / MLGym
MLGym A New Framework and Benchmark for Advancing AI Research Agents
☆519Updated last week
jerber / lang-jepa
☆114Updated 6 months ago
PrimeIntellect-ai / prime-rl
prime-rl is a codebase for decentralized async RL training at scale
☆347Updated this week
benchflow-ai / pokemon-gym
☆79Updated 2 months ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆230Updated last month
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 4 months ago
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆489Updated last month
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆102Updated 2 months ago
seal-rg / recurrent-pretraining
Pretraining code for a large-scale depth-recurrent language model
☆783Updated 2 weeks ago
HKUNLP / Dream
Dream 7B, a large diffusion language model
☆774Updated last week
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆94Updated 3 months ago
davidhershey / ClaudePlaysPokemonStarter
☆136Updated 2 months ago
PrimeIntellect-ai / genesys
☆127Updated 3 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆149Updated 2 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆41Updated last month
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆339Updated 6 months ago
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆312Updated 8 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆184Updated this week
jd-3d / SOLOBench
☆130Updated last month
joel-simon / lluminate
☆71Updated 2 weeks ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆101Updated 3 months ago