hitz-zentroa / This-is-not-a-Dataset

We introduce a large semi-automatically generated dataset of ~400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms that we use to evaluate LLMs

☆13

Alternatives and similar repositories for This-is-not-a-Dataset:

Users that are interested in This-is-not-a-Dataset are comparing it to the libraries listed below

facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated last year
liujch1998 / infini-gram
☆33Updated 2 weeks ago
csinva / iprompt
Finding semantically meaningful and accurate prompts.
☆46Updated last year
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆19Updated 2 months ago
amazon-science / faithful-summarization-generation
☆14Updated 2 years ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆23Updated 3 months ago
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆31Updated last year
benpry / chain-of-thought-metaphor
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Updated last year
daniel-furman / polyglot-or-not
Are foundation LMs multilingual knowledge bases? (EMNLP 2023)
☆19Updated last year
EleutherAI / tokengrams
Efficiently computing & storing token n-grams from large corpora
☆23Updated 6 months ago
mungg / FABLES
☆56Updated 7 months ago
salesforce / simplification
☆23Updated 2 months ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆65Updated 2 years ago
aryamanarora / causalgym
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆41Updated 4 months ago
google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆48Updated last year
allenai / hybrid-preferences
Learning to route instances for Human vs AI Feedback
☆23Updated 2 months ago
gonglinyuan / metro_t0
Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)
☆22Updated last year
salesforce / adversarial-polyglots
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
☆10Updated 3 years ago
choosewhatulike / case2code
☆15Updated 2 weeks ago
likenneth / persona_drift
Measuring and Controlling Persona Drift in Language Model Dialogs
☆17Updated last year
lucy3 / whos_filtered
☆14Updated 6 months ago
stanfordnlp / multi-distribution-retrieval
Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
☆14Updated last year
allenai / bff
☆38Updated last year
Rojak-NLP / LLM-Code-Mixing
Can LLMs generate code-mixed sentences through zero-shot prompting?
☆11Updated 2 years ago
feradauto / MoralCoT
Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
☆38Updated last year
mukhal / intrinsic-source-citation
[COLM '24] Source-Aware Training Enables Knowledge Attribution in Language Models
☆17Updated 3 weeks ago
alisawuffles / ambient
Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)
☆61Updated last year
fanyin3639 / Rethinking-instruction-effectiveness
The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…
☆29Updated last year
epfl-dlab / SynthIE
The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…
☆60Updated last year