microsoft / LLF-Bench
A benchmark for evaluating learning agents based on just language feedback
☆56Updated last month
Related projects ⓘ
Alternatives and complementary repositories for LLF-Bench
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆213Updated 3 weeks ago
- ☆73Updated 4 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆120Updated 6 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 2 months ago
- An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…☆62Updated last year
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆49Updated 2 months ago
- ☆89Updated 4 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆102Updated 7 months ago
- ☆76Updated 10 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆92Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆94Updated 2 weeks ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆98Updated 4 months ago
- Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).☆195Updated this week
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆24Updated last month
- Reasoning with Language Model is Planning with World Model☆144Updated last year
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆71Updated 10 months ago
- ☆202Updated last year
- ☆135Updated 6 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆96Updated last week
- A repository for transformer critique learning and generation☆85Updated 11 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆98Updated 5 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆200Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 2 weeks ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆77Updated 2 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆38Updated last month
- Super fast implementations of common benchmark text world games☆43Updated this week
- ☆105Updated last week
- RL algorithm: Advantage induced policy alignment☆62Updated last year
- ☆35Updated this week