allenai / WildBench

Benchmarking LLMs with Challenging Tasks from Real Users
182Updated last month

Related projects: