jinhaoduan / GTBench
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆58Updated 5 months ago
Alternatives and similar repositories for GTBench:
Users that are interested in GTBench are comparing it to the libraries listed below
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆73Updated last month
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆43Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆102Updated last year
- ☆92Updated last month
- ☆81Updated last year
- The official implementation of Self-Exploring Language Models (SELM)☆61Updated 8 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆82Updated 4 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 9 months ago
- ☆95Updated 7 months ago
- Natural Language Reinforcement Learning☆72Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- augmented LLM with self reflection☆112Updated last year
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 8 months ago
- ☆38Updated 3 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆27Updated 11 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆51Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆72Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆145Updated 2 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆100Updated 11 months ago
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆67Updated 6 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆31Updated last year
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆59Updated 3 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆53Updated 10 months ago
- ☆108Updated 3 weeks ago
- ☆20Updated 8 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆98Updated 4 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆55Updated last month