archiki / ADaPT
Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"
☆72Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for ADaPT
- ☆78Updated 11 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆65Updated 4 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Code for the paper 🌳 Tree Search for Language Model Agents☆140Updated 3 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆111Updated 5 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆97Updated last month
- ☆137Updated 6 months ago
- ☆37Updated this week
- ☆38Updated 4 months ago
- ☆52Updated 2 weeks ago
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated last month
- ☆112Updated last month
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆252Updated 4 months ago
- ☆22Updated 2 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 9 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆111Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆41Updated last month
- Official Repo for UGround☆100Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- Evaluating LLMs with CommonGen-Lite☆85Updated 8 months ago
- ☆74Updated 3 weeks ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆87Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆107Updated last year
- ☆103Updated 3 months ago
- ☆116Updated 5 months ago
- ☆51Updated 10 months ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆118Updated last month