holarissun / Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
☆26Updated 6 months ago
Related projects: ⓘ
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆82Updated last year
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆33Updated this week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆57Updated 7 months ago
- Rewarded soups official implementation☆43Updated 11 months ago
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…☆69Updated last month
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆84Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆23Updated 9 months ago
- [ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆44Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆89Updated 2 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆121Updated last year
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆39Updated last year
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆31Updated 4 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆28Updated 8 months ago
- ☆76Updated last month
- Reasoning with Language Model is Planning with World Model☆137Updated last year
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 6 months ago
- ☆14Updated 4 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆101Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- ☆102Updated 2 months ago
- ☆40Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆70Updated 5 months ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆66Updated 6 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 3 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆35Updated 8 months ago
- Lightweight Adapting for Black-Box Large Language Models☆16Updated 7 months ago
- Paper collections of the continuous effort start from World Models.☆127Updated 2 months ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year