marcus-jw / Targeted-Manipulation-and-Deception-in-LLMsLinks

Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-turn RL environment with support for agent, user, user feedback, transition and veto models. It also implements KTO and expert iteration for training on user preferences.
20Updated 10 months ago

Alternatives and similar repositories for Targeted-Manipulation-and-Deception-in-LLMs

Users that are interested in Targeted-Manipulation-and-Deception-in-LLMs are comparing it to the libraries listed below

Sorting: