intuit / sac3
Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
☆35Updated 2 months ago
Alternatives and similar repositories for sac3:
Users that are interested in sac3 are comparing it to the libraries listed below
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated last year
- Token-level Reference-free Hallucination Detection☆94Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆114Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆53Updated 6 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆21Updated 3 weeks ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆78Updated 11 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆123Updated 8 months ago
- ☆44Updated 6 months ago
- ☆47Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆51Updated 7 months ago
- We have released the code and demo program required for LLM with self-verification☆58Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Models☆46Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆67Updated 11 months ago
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆13Updated 5 months ago
- This repository contains data, code and models for contextual noncompliance.☆20Updated 8 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 7 months ago
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆47Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated last year
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 7 months ago
- Code and data for the FACTOR paper☆44Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated 2 years ago
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆58Updated 2 years ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆139Updated 5 months ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆47Updated last year