vivek3141 / gg-benchLinks

Measuring General Intelligence With Generated Games (Preprint)

☆26

Alternatives and similar repositories for gg-bench

Users that are interested in gg-bench are comparing it to the libraries listed below

Sorting:

axon-rl / gem
A Gym for Generalist LLMs
☆73Updated this week
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆179Updated 3 weeks ago
amazon-science / PAE
☆60Updated 5 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆92Updated last week
microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆140Updated last year
abdulhaim / LMRL-Gym
☆99Updated last year
spiral-rl / spiral
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆130Updated this week
facebookresearch / oni
Learn online intrinsic rewards from LLM feedback
☆42Updated 7 months ago
mklissa / maestromotif
Skill Design From AI Feedback
☆31Updated 5 months ago
rosieyzh / openrlhf-pretrain
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆20Updated 3 months ago
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆44Updated last year
heaplax / ARMAP
☆25Updated 2 months ago
jwhj / OREO
☆114Updated 6 months ago
XiaojuanTang / Mars
a benchmark to evaluate the situated inductive reasoning
☆16Updated 7 months ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆114Updated 4 months ago
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 5 months ago
facebookresearch / aira-dojo
AIRA-dojo: a framework for developing and evaluating AI research agents
☆84Updated this week
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆30Updated last year
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆238Updated this week
Parallel-Reasoning / APR
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆118Updated 3 months ago
BatsResearch / planetarium
Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL
☆57Updated 9 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆59Updated 3 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆53Updated last month
cmu-mind / RISE
☆32Updated 9 months ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆174Updated 8 months ago
HKUNLP / diffusion-vs-ar
[ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"
☆69Updated 5 months ago
tajwarfahim / paprika
Official Code Release for "Training a Generally Curious Agent"
☆31Updated 2 months ago
DigiRL-agent / digiq
☆109Updated 4 months ago
google-deepmind / bbeh
☆86Updated 3 months ago