jiangjiechen / auction-arena
Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena"
☆40Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for auction-arena
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 9 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆48Updated 5 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆104Updated 6 months ago
- ☆63Updated 3 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- ☆112Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆33Updated last month
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆102Updated 6 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆31Updated 4 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- Critique-out-Loud Reward Models☆38Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- ☆54Updated 2 months ago
- ☆72Updated 5 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated this week
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆54Updated 11 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 7 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆96Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆115Updated this week
- ☆48Updated 8 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆35Updated last month
- This the implementation of LeCo☆27Updated 4 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆54Updated this week