idavidrein / gpqa
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
☆322Updated 6 months ago
Alternatives and similar repositories for gpqa:
Users that are interested in gpqa are comparing it to the libraries listed below
- Reproducible, flexible LLM evaluations☆181Updated last week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆186Updated 4 months ago
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 4 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆401Updated this week
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆473Updated 9 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users