[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated 2 years ago
Alternatives and similar repositories for llm-planning-eval
Users that are interested in llm-planning-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Dec 29, 2024Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Jul 3, 2023Updated 2 years ago
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code, data, and model of paper "Text-to-SQL Error Correction with Language Models of Code" (ACL'23)☆31Aug 22, 2024Updated last year
- ☆19Nov 7, 2022Updated 3 years ago
- ☆18Jan 3, 2025Updated last year
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- An Illusion of Progress? Assessing the Current State of Web Agents☆176Jan 2, 2026Updated 4 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆80Apr 12, 2024Updated 2 years ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆238Jul 19, 2025Updated 10 months ago
- ☆25Apr 3, 2025Updated last year
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆111May 17, 2026Updated last week
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- [COLM'24] How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?☆22Oct 13, 2024Updated last year
- An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…☆34May 13, 2026Updated last week
- ☆61Nov 18, 2024Updated last year
- Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning (AAAI'19)☆28Oct 31, 2019Updated 6 years ago
- Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics☆32Feb 10, 2025Updated last year
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆27Mar 2, 2026Updated 2 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆43Mar 31, 2025Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆846Feb 3, 2025Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆70Dec 9, 2024Updated last year
- COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval☆16Dec 12, 2021Updated 4 years ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Mar 30, 2023Updated 3 years ago
- Codes and Data for ACL 2024 Paper "Faithful Logical Reasoning via Symbolic Chain-of-Thought".☆206Jan 29, 2026Updated 3 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆18Apr 5, 2025Updated last year
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75May 20, 2025Updated last year
- The official source code for TaleBrush (CHI 2022)☆15Jul 13, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆103Oct 5, 2025Updated 7 months ago
- ☆14May 9, 2024Updated 2 years ago
- Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion☆13Jul 26, 2023Updated 2 years ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆315Mar 11, 2026Updated 2 months ago
- [EMNLP 2023 Findings] ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought☆23Jan 11, 2024Updated 2 years ago
- ☆42Feb 2, 2024Updated 2 years ago
- Paper collections of the continuous effort start from World Models.☆212Jul 6, 2024Updated last year