likenneth / dialogue_action_token
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
☆14Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for dialogue_action_token
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆57Updated 2 months ago
- Critique-out-Loud Reward Models☆36Updated 3 weeks ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆79Updated this week
- Can Language Models Solve Olympiad Programming?☆100Updated 3 months ago
- ☆24Updated 6 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆66Updated 8 months ago
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆39Updated last month
- ☆14Updated last week
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 4 months ago
- ☆21Updated 2 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆48Updated 8 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆24Updated 2 months ago
- ☆28Updated 7 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆98Updated 5 months ago
- Evaluate the Quality of Critique☆35Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆95Updated 2 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 5 months ago
- ☆73Updated 4 months ago
- ☆34Updated 3 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated last month
- Scratchpad/Chain-of-Thought Prompts☆12Updated 2 years ago
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"☆92Updated 9 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆30Updated 3 months ago
- A repository for transformer critique learning and generation☆85Updated 11 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆105Updated 7 months ago
- Supporting code for ReCEval paper☆26Updated last month
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆94Updated 3 weeks ago
- ☆24Updated 4 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆96Updated last year