vivek3141 / gg-benchLinks
Measuring General Intelligence With Generated Games (Preprint)
☆25Updated last month
Alternatives and similar repositories for gg-bench
Users that are interested in gg-bench are comparing it to the libraries listed below
Sorting:
- a benchmark to evaluate the situated inductive reasoning☆16Updated 6 months ago
- Learn online intrinsic rewards from LLM feedback☆41Updated 6 months ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 5 months ago
- ☆61Updated 4 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆103Updated last week
- Benchmarking Agentic LLM and VLM Reasoning On Games☆166Updated 2 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆60Updated 4 months ago
- OMNI: Open-endedness via Models of human Notions of Interestingness☆50Updated 5 months ago
- Natural Language Reinforcement Learning☆90Updated 6 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆18Updated 3 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆48Updated last week
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated last week
- This code accompanies the paper "Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration."☆28Updated last month
- ☆27Updated last year
- Skill Design From AI Feedback☆30Updated 4 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆140Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆45Updated 7 months ago
- PoE-World: Compositional World Modeling with Products of Programmatic Experts☆30Updated last week
- ☆54Updated 2 weeks ago
- ☆98Updated last year
- BASALT Benchmark datasets, evaluation code and agent training example.☆20Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆29Updated last year
- AIRA-dojo: a framework for developing and evaluating AI research agents☆66Updated last week
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆114Updated 2 months ago
- Official Code Release for "Training a Generally Curious Agent"☆26Updated last month
- ☆114Updated 5 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆59Updated 2 months ago
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).☆60Updated 6 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆34Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 4 months ago