explodinggradients / nemesis

Reward Model framework for LLM RLHF
56Updated last year

Related projects: