likenneth / q_probe

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
37Updated 3 months ago

Related projects: