karthikv792 / LLMs-PlanningLinks

An extensible benchmark for evaluating large language models on planning

☆386

Alternatives and similar repositories for LLMs-Planning

Users that are interested in LLMs-Planning are comparing it to the libraries listed below

Sorting:

Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆274Updated last month
alfworld / alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
☆484Updated 6 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆181Updated 2 months ago
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆329Updated last year
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆206Updated 2 years ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆109Updated 3 months ago
AGI-Edgerunners / LLM-Planning-Papers
Must-read Papers on Large Language Model (LLM) Planning.
☆422Updated last year
princeton-nlp / WebShop
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆364Updated 10 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆146Updated 8 months ago
samkhur006 / awesome-llm-planning-reasoning
A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning mate…
☆281Updated 4 months ago
CraftJarvis / MC-Planner
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…
☆280Updated last year
sotopia-lab / sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆227Updated 2 weeks ago
composable-models / llm_multiagent_debate
ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate
☆450Updated 2 months ago
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆221Updated 2 months ago
1989Ryan / llm-mcts
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…
☆279Updated 7 months ago
SwiftSage / SwiftSage
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
☆310Updated 8 months ago
abdulhaim / LMRL-Gym
☆98Updated last year
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆318Updated 11 months ago
zhangxjohn / LLM-Agent-Benchmark-List
A banchmark list for evaluation of large language models.
☆130Updated 2 weeks ago
LeapLabTHU / ExpeL
☆142Updated 6 months ago
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆612Updated last month
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆126Updated last year
zjunlp / WKM
[NeurIPS 2024] Agent Planning with World Knowledge Model
☆141Updated 6 months ago
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆221Updated last year
web-arena-x / visualwebarena
VisualWebArena is a benchmark for multimodal agents.
☆357Updated 8 months ago
xlang-ai / xlang-paper-reading
Paper collection on building and evaluating language model agents via executable language grounding
☆356Updated last year
Ber666 / ToolkenGPT
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)
☆262Updated last year
mengdi-li / awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
☆174Updated 5 months ago