intuit / sac3Links

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

☆35

Alternatives and similar repositories for sac3

Users that are interested in sac3 are comparing it to the libraries listed below

Sorting:

microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆94Updated last year
OSU-NLP-Group / AttrScore
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
☆56Updated 2 years ago
anthonywchen / RARR
RARR: Researching and Revising What Language Models Say, Using Language Models
☆48Updated 2 years ago
McGill-NLP / instruct-qa
Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆85Updated 11 months ago
WENGSYX / Self-Verification
We have released the code and demo program required for LLM with self-verification
☆60Updated last year
veronica320 / Faithful-COT
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆161Updated last year
Betswish / MIRAGE
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆24Updated 4 months ago
sunlab-osu / Understanding-CoT
☆87Updated 2 years ago
google-research-datasets / GSM-IC
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…
☆60Updated 2 years ago
balevinstein / Probes
☆51Updated 2 years ago
eladsegal / strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
☆75Updated 2 years ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated last year
allenai / DecomP
Repository for Decomposed Prompting
☆91Updated last year
HKUNLP / icl-ceil
[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.
☆102Updated 2 years ago
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆78Updated last year
asaparov / prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆147Updated 8 months ago
chaochun / nlu-asdiv-dataset
☆48Updated 2 years ago
xlang-ai / icl-selective-annotation
[ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"
☆108Updated last year
McGill-NLP / retriever-lm-reasoning
Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…
☆28Updated last year
violet-zct / fairseq-detect-hallucination
Detect hallucinated tokens for conditional sequence generation.
☆64Updated 3 years ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆59Updated last year
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 3 weeks ago
yuh-zha / AlignScore
ACL2023 - AlignScore, a metric for factual consistency evaluation.
☆132Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
faridlazuarda / cultural-llm-papers
A curated list of research papers and resources on Cultural LLM.
☆45Updated 9 months ago
oriyor / ret-robust
Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"
☆69Updated 11 months ago
nayeon7lee / FactualityPrompt
☆86Updated 2 years ago
chaitanyamalaviya / ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆131Updated last year
jaehunjung1 / Maieutic-Prompting
☆50Updated last year
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆77Updated 6 months ago