babelcloud / LLM-RGB

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
132Updated last week

Related projects

Alternatives and complementary repositories for LLM-RGB