suzgunmirac / BIG-Bench-HardLinks
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆527Updated last year
Alternatives and similar repositories for BIG-Bench-Hard
Users that are interested in BIG-Bench-Hard are comparing it to the libraries listed below
Sorting:
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆549Updated last year
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆382Updated 2 years ago
- Prod Env☆433Updated 2 years ago
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.☆752Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆355Updated last year
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.