alexzhang13 / videogamebenchLinks
Benchmark environment for evaluating vision-language models (VLMs) on popular video games!
☆277Updated 3 weeks ago
Alternatives and similar repositories for videogamebench
Users that are interested in videogamebench are comparing it to the libraries listed below
Sorting:
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆122Updated this week
- Scaling RL on advanced reasoning models☆308Updated this week
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆514Updated this week
- Build your own visual reasoning model☆385Updated last week
- Benchmarking Agentic LLM and VLM Reasoning On Games☆156Updated last month
- ☆158Updated last month
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆109Updated 2 months ago
- ☆145Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆519Updated last week
- ☆114Updated 6 months ago
- prime-rl is a codebase for decentralized async RL training at scale☆347Updated this week
- ☆79Updated 2 months ago
- Exploring Applications of GRPO☆230Updated last month
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 4 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆489Updated last month
- EvaByte: Efficient Byte-level Language Models at Scale☆102Updated 2 months ago
- Pretraining code for a large-scale depth-recurrent language model☆783Updated 2 weeks ago
- Dream 7B, a large diffusion language model☆774Updated last week
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- ☆136Updated 2 months ago
- ☆127Updated 3 months ago
- Open source interpretability artefacts for R1.☆149Updated 2 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆41Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆339Updated 6 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆312Updated 8 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆184Updated this week
- ☆130Updated last month
- ☆71Updated 2 weeks ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆101Updated 3 months ago