babelcloud / LLM-RGB

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
128Updated this week

Related projects

Alternatives and complementary repositories for LLM-RGB