jinhaoduan / GTBenchLinks
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆66Updated last year
Alternatives and similar repositories for GTBench
Users that are interested in GTBench are comparing it to the libraries listed below
Sorting:
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆47Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆106Updated 2 months ago
- ☆100Updated last year
- ☆144Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆145Updated 10 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆121Updated last year
- ☆86Updated last year
- ☆116Updated 8 months ago
- ☆122Updated last year
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆79Updated last year
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- How to create rational LLM-based agents? Using game-theoretic workflows!☆74Updated 4 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 8 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆78Updated last year
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆127Updated 4 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆129Updated 11 months ago
- Natural Language Reinforcement Learning☆97Updated 2 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆139Updated 7 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆110Updated last year
- ☆41Updated 10 months ago
- ☆63Updated 7 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆118Updated 4 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆59Updated last year
- ☆46Updated 3 months ago
- ☆123Updated 7 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆63Updated 10 months ago
- Reasoning with Language Model is Planning with World Model☆175Updated 2 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 11 months ago