StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Paper.
☆145Updated 2 months ago
Alternatives and similar repositories for appworld:
Users that are interested in appworld are comparing it to the libraries listed below
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆178Updated 6 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆110Updated 2 months ago
- AWM: Agent Workflow Memory☆241Updated 2 weeks ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆215Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆123Updated last month
- augmented LLM with self reflection☆111Updated last year
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆178Updated 7 months ago
- ☆53Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆112Updated 8 months ago
- ☆92Updated last month
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆281Updated 9 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- A banchmark list for evaluation of large language models.☆80Updated 7 months ago
- ☆209Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆161Updated last week
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆206Updated last month
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆128Updated 3 months ago
- ☆108Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆81Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆157Updated this week
- ☆95Updated 7 months ago
- ☆130Updated 2 months ago