jinhaoduan / GTBench
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆58Updated 4 months ago
Alternatives and similar repositories for GTBench:
Users that are interested in GTBench are comparing it to the libraries listed below
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆107Updated last month
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆42Updated 11 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆57Updated this week
- 🌾 OAT: Online AlignmenT for LLMs☆81Updated 3 weeks ago
- ☆37Updated 2 months ago
- ☆83Updated this week
- ☆140Updated 8 months ago
- Natural Language Reinforcement Learning☆67Updated last month
- ☆81Updated last year
- ☆93Updated 6 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆60Updated 7 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆77Updated 3 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆47Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆109Updated 2 months ago
- augmented LLM with self reflection☆109Updated last year
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆134Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆88Updated 4 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆106Updated 8 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated last week
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆100Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆55Updated 8 months ago
- ☆43Updated 2 weeks ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆126Updated 9 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆100Updated this week
- ☆50Updated 2 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆52Updated 8 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆39Updated 2 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆106Updated 8 months ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆46Updated last month