bigscience-workshop / evaluationLinks

Code and Data for Evaluation WG

☆42

Alternatives and similar repositories for evaluation

Users that are interested in evaluation are comparing it to the libraries listed below

Sorting:

jwieting / paraphrastic-representations-at-scale
☆75Updated 4 years ago
facebookresearch / access
Code to reproduce the experiments from the paper.
☆102Updated 2 years ago
shmsw25 / bart-closed-book-qa
A BART version of an open-domain QA model in a closed-book setup
☆119Updated 5 years ago
facebookresearch / asset
A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
☆56Updated 3 years ago
GEM-benchmark / GEM-metrics
Automatic metrics for GEM tasks
☆67Updated 3 years ago
jayded / eraserbenchmark
A benchmark for understanding and evaluating rationales: http://www.eraserbenchmark.com/
☆99Updated 2 years ago
google-research-datasets / QED
QED: A Framework and Dataset for Explanations in Question Answering
☆118Updated 4 years ago
facebookresearch / PAQ
Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"
☆207Updated 4 years ago
AkariAsai / XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
☆80Updated 4 years ago
lucidrains / marge-pytorch
Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
☆76Updated 4 years ago
allenai / allentune
Hyperparameter Search for AllenNLP
☆139Updated 7 months ago
Kaleidophon / awesome-experimental-standards-deep-learning
Repository collecting resources and best practices to improve experimental rigour in deep learning research.
☆27Updated 2 years ago
danieldeutsch / sacrerouge
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
☆146Updated 3 years ago
facebookresearch / QA-Overlap
Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"
☆66Updated 4 years ago
ThomasScialom / QuestEval
☆100Updated last year
naver / gdc
Code accompanying our papers on the "Generative Distributional Control" framework
☆118Updated 2 years ago
shmsw25 / AmbigQA
An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"
☆119Updated 3 years ago
yanaiela / amnesic_probing
☆39Updated 4 years ago
alexwarstadt / blimp
The Benchmark of Linguistic Minimal Pairs
☆155Updated 2 years ago
google-research-datasets / xsum_hallucination_annotations
Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h…
☆84Updated 4 years ago
jacobandreas / geca
☆42Updated 4 years ago
sebastianruder / emnlp2021-multiqa-tutorial
EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering
☆38Updated 3 years ago
machelreid / m2d2
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Updated 2 years ago
ethanachi / multilingual-probing-visualization
Codebase for probing and visualizing multilingual models.
☆49Updated 5 years ago
timoschick / dino
This repository contains the code for "Generating Datasets with Pretrained Language Models".
☆189Updated 4 years ago
salesforce / GeDi
GeDi: Generative Discriminator Guided Sequence Generation
☆208Updated 4 months ago
thevasudevgupta / transformers-adapters
This repositary hosts my experiments for the project, I did with OffNote Labs.
☆10Updated 4 years ago
adapter-hub / hgiyt
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
☆27Updated 4 years ago
dallascard / NLP-power-analysis
Replication code for "With Little Power Comes Great Responsibility"
☆39Updated 5 years ago
danielvarab / massive-summ
☆31Updated 2 years ago