YangLinyi / GLUE-X

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.
93Updated last year

Alternatives and similar repositories for GLUE-X:

Users that are interested in GLUE-X are comparing it to the libraries listed below