microsoft / LLF-BenchLinks

A benchmark for evaluating learning agents based on just language feedback

☆86

Alternatives and similar repositories for LLF-Bench

Users that are interested in LLF-Bench are comparing it to the libraries listed below

Sorting:

microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆140Updated last year
abdulhaim / LMRL-Gym
☆99Updated last year
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆280Updated 3 weeks ago
BladeTransformerLLC / OvercookedGPT
An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…
☆69Updated 2 years ago
allenai / clin
☆84Updated last year
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆140Updated 8 months ago
conglu1997 / intelligent-go-explore
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
☆61Updated 5 months ago
agentification / RAFA_code
☆143Updated last year
Agent-E3 / ExACT
☆20Updated 4 months ago
minaek / reward_design_with_llms
☆220Updated 2 years ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆103Updated 2 weeks ago
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆112Updated 4 months ago
flowersteam / lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
☆236Updated 9 months ago
zhao-ht / LearnAct
Code for paper Empowering Large Language Model Agents through Action Learning
☆31Updated last year
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆128Updated last year
DeLLMa / DeLLMa
Official Implementation of "DeLLMa: Decision Making Under Uncertainty with Large Language Models"
☆61Updated 9 months ago
jlin816 / dialop
DialOp: Decision-oriented dialogue environments for collaborative language agents
☆109Updated 8 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆92Updated last week
jwhj / OREO
☆114Updated 6 months ago
mindagent / mindagent
☆92Updated last year
amazon-science / alexa-arena
☆109Updated last month
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
zorazrw / agent-skill-induction
Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"
☆26Updated 3 months ago
sanjibanc / agent_prm
☆43Updated 5 months ago
archiki / ADaPT
Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"
☆87Updated last year
amazon-science / PAE
☆60Updated 5 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆393Updated last month
bigai-nlco / langsuite
Official Repo of LangSuitE
☆84Updated 11 months ago