jinhaoduan / GTBench
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆60Updated 7 months ago
Alternatives and similar repositories for GTBench:
Users that are interested in GTBench are comparing it to the libraries listed below
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆44Updated last year
- ☆96Updated 9 months ago
- ☆107Updated 3 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated 10 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆112Updated 11 months ago
- ☆90Updated 9 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 5 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Natural Language Reinforcement Learning☆87Updated 4 months ago
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆106Updated 2 months ago
- ☆55Updated last month
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆136Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆53Updated 10 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆62Updated 11 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆98Updated last year
- ☆114Updated 2 months ago
- augmented LLM with self reflection☆119Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆45Updated 3 weeks ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 10 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆135Updated 5 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 5 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆79Updated 3 weeks ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- ☆142Updated 11 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆69Updated 10 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆107Updated 3 weeks ago