hitz-zentroa / This-is-not-a-Dataset
We introduce a large semi-automatically generated dataset of ~400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms that we use to evaluate LLMs
☆13Updated 11 months ago
Alternatives and similar repositories for This-is-not-a-Dataset:
Users that are interested in This-is-not-a-Dataset are comparing it to the libraries listed below
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆33Updated 2 weeks ago
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- Minimum Description Length probing for neural network representations☆19Updated 2 months ago
- ☆14Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- Are foundation LMs multilingual knowledge bases? (EMNLP 2023)☆19Updated last year
- Efficiently computing & storing token n-grams from large corpora☆23Updated 6 months ago
- ☆56Updated 7 months ago
- ☆23Updated 2 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆41Updated 4 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- Learning to route instances for Human vs AI Feedback☆23Updated 2 months ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Updated last year
- Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)☆10Updated 3 years ago
- ☆15Updated 2 weeks ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- ☆14Updated 6 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- ☆38Updated last year
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Updated 2 years ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- [COLM '24] Source-Aware Training Enables Knowledge Attribution in Language Models☆17Updated 3 weeks ago
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)☆61Updated last year
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…☆29Updated last year
- The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…☆60Updated last year