OSU-NLP-Group / llm-planning-evalView external linksLinks
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated last year
Alternatives and similar repositories for llm-planning-eval
Users that are interested in llm-planning-eval are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 7 months ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Jul 3, 2023Updated 2 years ago
- ☆19Nov 7, 2022Updated 3 years ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆235Jul 19, 2025Updated 6 months ago
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Code, data, and model of paper "Text-to-SQL Error Correction with Language Models of Code" (ACL'23)☆31Aug 22, 2024Updated last year
- The official source code for TaleBrush (CHI 2022)☆15Jul 13, 2022Updated 3 years ago
- ☆14May 9, 2024Updated last year
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆16Oct 14, 2024Updated last year
- The repository contains generative AI analytics platform application code.☆28Sep 25, 2025Updated 4 months ago
- ☆87Dec 15, 2023Updated 2 years ago
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- Symmetric Encryption with Language Models☆13Jun 13, 2023Updated 2 years ago
- ☆15Aug 18, 2022Updated 3 years ago
- ☆61Nov 18, 2024Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆69Dec 9, 2024Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆98Dec 18, 2025Updated last month
- ☆18Jan 3, 2025Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆822Feb 3, 2025Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆124Aug 26, 2025Updated 5 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆471Nov 7, 2025Updated 3 months ago
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Aug 20, 2024Updated last year
- ☆16Oct 16, 2024Updated last year
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 5 months ago
- Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion☆14Jul 26, 2023Updated 2 years ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Feb 11, 2025Updated last year
- [ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models☆214Mar 26, 2025Updated 10 months ago
- Codes and Data for ACL 2024 Paper "Faithful Logical Reasoning via Symbolic Chain-of-Thought".☆201Jan 29, 2026Updated 2 weeks ago
- ☆42Feb 2, 2024Updated 2 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Oct 19, 2024Updated last year
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆43Mar 20, 2024Updated last year
- Uncertainty quantification for in-context learning of large language models☆15Apr 1, 2024Updated last year
- [ACL'23 Findings] "Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors"☆40Dec 22, 2023Updated 2 years ago
- The implementation of <Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation> in PyTorch.☆17Nov 11, 2021Updated 4 years ago
- ☆53Sep 16, 2024Updated last year
- ☆15Feb 21, 2024Updated last year