likenneth / dialogue_action_token
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
☆15Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for dialogue_action_token
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 8 months ago
- ☆15Updated 3 weeks ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆97Updated 2 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆80Updated last week
- Evaluate the Quality of Critique☆35Updated 5 months ago
- ☆89Updated 11 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆74Updated 9 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 5 months ago
- Model Selection with Large Language Models for Reasoning (EMNLP2023 Findings)☆29Updated 10 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆45Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆30Updated 3 months ago
- ☆33Updated 9 months ago
- Critique-out-Loud Reward Models☆38Updated last month
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆58Updated 3 months ago
- ☆73Updated 4 months ago
- AbstainQA, ACL 2024☆19Updated last month
- CodeUltraFeedback: aligning large language models to coding preferences☆65Updated 4 months ago
- Directional Preference Alignment☆51Updated 2 months ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆31Updated 3 months ago
- ☆25Updated 4 months ago
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆27Updated last month
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆29Updated last month
- ☆34Updated 3 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month