likenneth / dialogue_action_tokenLinks

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

☆26

Alternatives and similar repositories for dialogue_action_token

Users that are interested in dialogue_action_token are comparing it to the libraries listed below

Sorting:

OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆60Updated 6 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆147Updated 9 months ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆109Updated 6 months ago
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
sanjibanc / agent_prm
☆43Updated 5 months ago
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences
☆71Updated last year
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆48Updated 8 months ago
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
YuxiXie / SelfEval-Guided-Decoding
☆99Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆127Updated last year
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 8 months ago
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆61Updated 9 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆82Updated 2 months ago
NingMiao / SelfCheck
Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>
☆48Updated 2 years ago
chenhongqiao / ToolDec
Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding
☆27Updated last year
SIMONLQY / RethinkMCTS
☆28Updated 10 months ago
shunzh / Code-AI-Tree-Search
☆119Updated last year
scaleapi / plansearch
e
☆39Updated 3 months ago
MLE-Dojo / MLE-Dojo
☆61Updated last week
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆105Updated 3 months ago
Reason-Wang / NAT
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆26Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
sauc-abadal / ALT
Official repository for ALT (ALignment with Textual feedback).
☆10Updated last year
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated 9 months ago
Zayne-sprague / MuSR
☆49Updated 11 months ago
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆122Updated 8 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆125Updated 4 months ago
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆35Updated 2 months ago