bigscience-workshop / lm-evaluation-harnessLinks

A framework for few-shot evaluation of autoregressive language models.

☆104

Alternatives and similar repositories for lm-evaluation-harness

Users that are interested in lm-evaluation-harness are comparing it to the libraries listed below

Sorting:

nyu-mll / quality
☆141Updated 9 months ago
yizhongw / Tk-Instruct
Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
☆181Updated 2 years ago
facebookresearch / dpr-scale
Scalable training for dense retrieval models.
☆297Updated 4 months ago
allenai / catwalk
This project studies the performance and robustness of language models and task-adaptation methods.
☆154Updated last year
chaitanyamalaviya / ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆135Updated last year
facebookresearch / tart
Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.
☆165Updated 2 years ago
bigscience-workshop / t-zero
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
☆462Updated 2 years ago
neulab / knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an…
☆280Updated 3 years ago
orhonovich / unnatural-instructions
☆179Updated 2 years ago
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆60Updated 8 months ago
microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆96Updated 2 years ago
tau-nlp / scrolls
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
☆69Updated last year
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year
Mivg / SLED
The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper
☆70Updated 2 years ago
facebookresearch / NPM
The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)
☆158Updated 2 years ago
AlexTMallen / adaptive-retrieval
☆189Updated 3 months ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆177Updated 2 years ago
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 3 months ago
kayoyin / interpret-lm
Interpreting Language Models with Contrastive Explanations (EMNLP 2022 Best Paper Honorable Mention)
☆62Updated 3 years ago
allenai / wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
☆223Updated 11 months ago
mega002 / lm-debugger
The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.
☆179Updated 3 years ago
mbzuai-nlp / bactrian-x
A Multilingual Replicable Instruction-Following Model
☆95Updated 2 years ago
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆263Updated last year
allenai / Lila
A unified benchmark for math reasoning
☆88Updated 2 years ago
seonghyeonye / TAPP
[AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
☆78Updated last year
google-deepmind / streamingqa
☆49Updated 2 years ago
allenai / peS2o
Pretraining Efficiently on S2ORC!
☆170Updated last year
awebson / prompt_semantics
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”
☆85Updated 3 years ago
google-research / t5x_retrieval
☆101Updated 2 years ago