StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Paper.
☆134Updated last month
Alternatives and similar repositories for appworld:
Users that are interested in appworld are comparing it to the libraries listed below
- Benchmarking LLMs with Challenging Tasks from Real Users☆206Updated 2 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆163Updated 5 months ago
- augmented LLM with self reflection☆109Updated last year
- ☆120Updated 7 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆98Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆129Updated 2 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆202Updated this week
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆153Updated last month
- LOFT: A 1 Million+ Token Long-Context Benchmark☆164Updated 2 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆126Updated 2 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆106Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- ☆205Updated 5 months ago
- ☆81Updated this week
- ☆89Updated this week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆175Updated last month
- AWM: Agent Workflow Memory☆231Updated last month
- A simple unified framework for evaluating LLMs☆164Updated 3 weeks ago
- UGround: Universal GUI Visual Grounding for GUI Agents☆138Updated this week
- Repo of paper "Free Process Rewards without Process Labels"☆94Updated this week
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆174Updated 5 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆109Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆150Updated 10 months ago
- ☆135Updated 3 months ago
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆176Updated this week
- An Analytical Evaluation Board of Multi-turn LLM Agents☆270Updated 7 months ago
- A banchmark list for evaluation of large language models.☆76Updated 6 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆77Updated 2 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆120Updated 6 months ago