alexzhang13 / videogamebench
Benchmark environment for evaluating vision-language models (VLMs) on popular video games!
☆218Updated last week
Alternatives and similar repositories for videogamebench
Users that are interested in videogamebench are comparing it to the libraries listed below
Sorting:
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- ☆151Updated last week
- ☆64Updated last month
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆492Updated this week
- GRadient-INformed MoE☆262Updated 7 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆461Updated last week
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆103Updated 3 weeks ago
- Scaling Data for SWE-agents☆160Updated this week
- prime-rl is a codebase for decentralized RL training at scale☆211Updated this week
- Benchmarking Agentic LLM and VLM Reasoning On Games☆141Updated last week
- ☆111Updated 4 months ago
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- Computer gaming agents that run on your PC and laptops.☆593Updated this week
- ☆139Updated last month
- EvaByte: Efficient Byte-level Language Models at Scale☆97Updated 3 weeks ago
- ☆138Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆327Updated 5 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 2 months ago
- ☆125Updated last month
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆357Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆156Updated this week
- Build your own visual reasoning model☆362Updated this week
- Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words☆85Updated last week
- ☆84Updated 2 weeks ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆37Updated last week
- Exploring Applications of GRPO☆212Updated last week
- II-Researcher: a new open-source framework designed to aid building search / research agents☆248Updated last week
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆75Updated 2 weeks ago
- ☆147Updated last month
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"☆73Updated last month