maitrix-org / dynamic-alignment-optimizationView on GitHub
[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training.
25Nov 17, 2024Updated last year

Alternatives and similar repositories for dynamic-alignment-optimization

Users that are interested in dynamic-alignment-optimization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?