lechmazur / generalization

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
41Updated last week

Alternatives and similar repositories for generalization:

Users that are interested in generalization are comparing it to the libraries listed below