lupantech / PromptPG
Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".
☆138Updated 8 months ago
Related projects: ⓘ
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆121Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆101Updated last month
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆100Updated 3 months ago
- ☆164Updated last month
- ☆79Updated 3 months ago
- Reasoning with Language Model is Planning with World Model☆137Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆119Updated last year
- ☆158Updated last year
- Self-Alignment with Principle-Following Reward Models☆144Updated 6 months ago
- ☆80Updated 9 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆63Updated last year
- Chain-of-Hindsight, A Scalable RLHF Method☆213Updated 11 months ago
- Code for reproducing the ACL'23 paper: Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments☆69Updated 8 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆39Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆89Updated 2 months ago
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆91Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆82Updated last year
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆84Updated 10 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆64Updated 7 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆151Updated 4 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆28Updated 8 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆51Updated 5 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆208Updated last week
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆57Updated 7 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly"☆121Updated 3 months ago
- Code for ACL2023 paper: Pre-Training to Learn in Context☆106Updated last month
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆104Updated 2 months ago