[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated 2 years ago
Alternatives and similar repositories for llm-planning-eval
Users that are interested in llm-planning-eval are comparing it to the libraries listed below
Sorting:
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Dec 29, 2024Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Jul 3, 2023Updated 2 years ago
- ☆19Nov 7, 2022Updated 3 years ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Jul 19, 2025Updated 7 months ago
- [COLM'24] "How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?"☆22Oct 13, 2024Updated last year
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Mar 30, 2023Updated 2 years ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆112Sep 28, 2024Updated last year
- Code, data, and model of paper "Text-to-SQL Error Correction with Language Models of Code" (ACL'23)☆31Aug 22, 2024Updated last year
- An Illusion of Progress? Assessing the Current State of Web Agents☆147Jan 2, 2026Updated 2 months ago
- The official source code for TaleBrush (CHI 2022)☆15Jul 13, 2022Updated 3 years ago
- ☆14May 9, 2024Updated last year
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75May 20, 2025Updated 9 months ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆31Aug 15, 2024Updated last year
- The repository contains generative AI analytics platform application code.☆29Sep 25, 2025Updated 5 months ago
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆16Updated this week
- ☆87Dec 15, 2023Updated 2 years ago
- ☆61Nov 18, 2024Updated last year
- ☆15Aug 18, 2022Updated 3 years ago
- Symmetric Encryption with Language Models☆13Jun 13, 2023Updated 2 years ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆70Dec 9, 2024Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Synthetic QA generation for long documents.☆16Jul 22, 2022Updated 3 years ago
- A customizable GPT in a single page, using OpenAI models text-embedding-ada-002, tts-1, whisper-1, dall-e-3, and gpt-4-vision-preview☆14Jul 9, 2024Updated last year
- ☆18Jan 3, 2025Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆154Feb 3, 2025Updated last year
- ☆68Jul 24, 2024Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆124Aug 26, 2025Updated 6 months ago
- Repository for Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts, EMNLP22☆19Jun 23, 2023Updated 2 years ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Feb 11, 2025Updated last year
- ☆17Oct 16, 2024Updated last year
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 6 months ago
- Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion☆14Jul 26, 2023Updated 2 years ago
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Aug 20, 2024Updated last year