apartresearch / specificityplusLinks

👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

☆20

Alternatives and similar repositories for specificityplus

Users that are interested in specificityplus are comparing it to the libraries listed below

Sorting:

awebson / prompt_semantics
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”
☆85Updated 3 years ago
kayoyin / interpret-lm
Interpreting Language Models with Contrastive Explanations (EMNLP 2022 Best Paper Honorable Mention)
☆62Updated 3 years ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated 2 years ago
google / belief-localization
This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…
☆61Updated 2 years ago
nkandpa2 / long_tail_knowledge
Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"
☆77Updated 2 years ago
microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆96Updated 2 years ago
yanaiela / pararel
☆45Updated last year
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆94Updated 3 years ago
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 3 weeks ago
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆78Updated last year
ryokamoi / wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
☆41Updated last year
Betswish / MIRAGE
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆24Updated 5 months ago
liujch1998 / memo-trap
☆22Updated 2 years ago
McGill-NLP / instruct-qa
Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆86Updated 11 months ago
guy-dar / embedding-space
☆54Updated 2 years ago
huggingface / that_is_good_data
☆66Updated 2 years ago
jzbjyb / lm-calibration
☆35Updated 3 years ago
nicola-decao / KnowledgeEditor
Code for Editing Factual Knowledge in Language Models
☆139Updated 3 years ago
aryamanarora / causalgym
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆46Updated 8 months ago
ekinakyurek / influence
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆38Updated 2 years ago
lifan-yuan / OOD_NLP
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆34Updated 2 years ago
jaehunjung1 / Maieutic-Prompting
☆51Updated last year
nayeon7lee / FactualityPrompt
☆87Updated 2 years ago
ruiqi-zhong / DescribeDistributionalDifferences
Code for preprint: Summarizing Differences between Text Distributions with Natural Language
☆42Updated 2 years ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆59Updated last year
McGill-NLP / FaithDial
☆51Updated 2 years ago
salesforce / creativity_eval
☆37Updated last month
wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆41Updated 2 years ago
eladsegal / strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
☆76Updated 2 years ago
joeljang / ELM
[ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning
☆99Updated 2 years ago