clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark

☆28

Alternatives and similar repositories for clembench:

Users that are interested in clembench are comparing it to the libraries listed below

Leukas / CUTE
☆12Updated 4 months ago
princeton-nlp / ShortcutGrammar
EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
☆58Updated last year
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆56Updated 8 months ago
ryokamoi / wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
☆40Updated last year
juletx / self-translate
Do Multilingual Language Models Think Better in English?
☆41Updated last year
alisawuffles / wanli
code associated with WANLI dataset in Liu et al., 2022
☆29Updated last year
bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆70Updated 11 months ago
huggingface / that_is_good_data
☆65Updated last year
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated last year
yuzhaouoe / pretraining-data-packing
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆19Updated 6 months ago
nyu-mll / SQuALITY
Query-focused summarization data
☆41Updated 2 years ago
lukemelas / mtob
☆29Updated 8 months ago
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆77Updated 10 months ago
velocityCavalry / CREPE
An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"
☆14Updated 3 months ago
allenai / few_shot_explanations
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆31Updated last year
EleutherAI / semantic-memorization
☆44Updated 3 months ago
kaistAI / InstructIR
IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…
☆31Updated 8 months ago
UniversalNER / UniversalNER
☆26Updated 6 months ago
ahmetustun / hyperx
☆20Updated 2 years ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆56Updated 2 years ago
allenai / dream
☆23Updated 5 months ago
google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆46Updated last year
danieldeutsch / qaeval
☆15Updated 3 years ago
gmftbyGMFTBY / Rep-Dropout
[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
☆30Updated last year
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆59Updated 3 weeks ago
thakur-nandan / income
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆22Updated last year
anthonywchen / MOCHA
Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".
☆16Updated 2 years ago
swarnaHub / SummarizationPrograms
[ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
☆23Updated last year
google-deepmind / streamingqa
☆45Updated last year
zouharvi / tokenization-scorer
Simple-to-use scoring function for arbitrarily tokenized texts.
☆37Updated this week