Singla17 / dynamic-alignment-optimization

[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training.
20Updated 3 months ago

Alternatives and similar repositories for dynamic-alignment-optimization:

Users that are interested in dynamic-alignment-optimization are comparing it to the libraries listed below