allenai / WildBench

Benchmarking LLMs with Challenging Tasks from Real Users
195Updated 2 weeks ago

Related projects

Alternatives and complementary repositories for WildBench