hendrycks / test
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,148Updated last year
Related projects: ⓘ
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,436Updated this week
- ☆1,194Updated this week
- Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09…☆1,857Updated this week
- [ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…☆858Updated 4 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,561Updated last year
- ☆1,456Updated 3 weeks ago
- A framework for the evaluation of autoregressive code generation language models.☆776Updated this week
- YaRN: Efficient Context Window Extension of Large Language Models