declare-lab/instruct-eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/declare-lab/instruct-eval)

declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

☆552

Alternatives and similar repositories for instruct-eval

Users that are interested in instruct-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,340Jul 13, 2026Updated last week
declare-lab / flacuna
View on GitHub
Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…
☆112Sep 10, 2023Updated 2 years ago
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
hendrycks / test
View on GitHub
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,602May 28, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,607Mar 27, 2023Updated 3 years ago
declare-lab / flan-alpaca
View on GitHub
This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as…
☆356Jul 4, 2023Updated 3 years ago
tatsu-lab / alpaca_eval
View on GitHub
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆2,005Aug 9, 2025Updated 11 months ago
XueFuzhao / OpenMoE
View on GitHub
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,691Mar 8, 2024Updated 2 years ago
DachengLi1 / LongChat
View on GitHub
Official repository for LongChat and LongEval
☆536May 24, 2024Updated 2 years ago
declare-lab / TEAM
View on GitHub
Our EMNLP 2022 paper on MCQA
☆23Jan 15, 2023Updated 3 years ago
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,801Updated this week
Instruction-Tuning-with-GPT-4 / GPT-4-LLM
View on GitHub
Instruction Tuning with GPT-4
☆4,332Jun 11, 2023Updated 3 years ago
stanford-crfm / helm
View on GitHub
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …
☆2,857Jul 1, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
google-research / FLAN
View on GitHub
☆1,565Jul 2, 2026Updated 2 weeks ago
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
declare-lab / VIP
View on GitHub
Our EMNLP 2022 paper on VIP-Based Prompting for Parameter-Efficient Learning
☆10Oct 22, 2022Updated 3 years ago
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,480Jun 7, 2025Updated last year
SinclairCoder / Instruction-Tuning-Papers
View on GitHub
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
☆769Jul 20, 2023Updated 3 years ago
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,150Jun 1, 2023Updated 3 years ago
abacaj / code-eval
View on GitHub
Run evaluation on LLMs using human-eval benchmark
☆429Sep 12, 2023Updated 2 years ago
yaodongC / awesome-instruction-dataset
View on GitHub
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
☆1,152Jan 4, 2024Updated 2 years ago
lm-sys / llm-decontaminator
View on GitHub
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆324Dec 20, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
raunak-agarwal / instruction-datasets
View on GitHub
Datasets for Instruction Tuning of Large Language Models
☆261Nov 30, 2023Updated 2 years ago
allenai / WildBench
View on GitHub
Benchmarking LLMs with Challenging Tasks from Real Users
☆255Nov 3, 2024Updated last year
bigscience-workshop / promptsource
View on GitHub
Toolkit for creating, sharing and using natural language prompts.
☆3,027Oct 23, 2023Updated 2 years ago
google / BIG-bench
View on GitHub
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
☆3,249Jul 19, 2024Updated 2 years ago
togethercomputer / RedPajama-Data
View on GitHub
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
☆4,969Jun 3, 2026Updated last month
bigscience-workshop / t-zero
View on GitHub
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
☆463Nov 5, 2022Updated 3 years ago
mosaicml / llm-foundry
View on GitHub
LLM training code for Databricks foundation models
☆4,429Mar 25, 2026Updated 3 months ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,494May 1, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
gururise / AlpacaDataCleaned
View on GitHub
Alpaca dataset from Stanford, cleaned and curated
☆1,602Mar 7, 2026Updated 4 months ago
artidoro / qlora
View on GitHub
QLoRA: Efficient Finetuning of Quantized LLMs
☆10,964Jun 10, 2024Updated 2 years ago
OpenLMLab / LOMO
View on GitHub
LOMO: LOw-Memory Optimization
☆994Jul 2, 2024Updated 2 years ago
THUDM / LongBench
View on GitHub
LongBench v2 and LongBench (ACL 25'&24')
☆1,212Jan 15, 2025Updated last year
OpenLMLab / LEval
View on GitHub
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
☆406Jul 9, 2024Updated 2 years ago
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,851Jun 17, 2025Updated last year
suzgunmirac / BIG-Bench-Hard
View on GitHub
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆565Jun 25, 2024Updated 2 years ago