hkust-nlp/felm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkust-nlp/felm)

hkust-nlp / felm

Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)

☆65

Alternatives and similar repositories for felm

Users that are interested in felm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zthang / Focus
View on GitHub
☆24Feb 3, 2024Updated 2 years ago
HKUST-KnowComp / SubeventWriter
View on GitHub
Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…
☆11Oct 16, 2022Updated 3 years ago
AI21Labs / factor
View on GitHub
Code and data for the FACTOR paper
☆54Nov 15, 2023Updated 2 years ago
hkust-nlp / Activation_Decoding
View on GitHub
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆64Mar 30, 2024Updated 2 years ago
RUCAIBox / HaluEval
View on GitHub
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆592Feb 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
anthonywchen / RARR
View on GitHub
RARR: Researching and Revising What Language Models Say, Using Language Models
☆54Jun 22, 2023Updated 3 years ago
yuxiaw / Factcheck-GPT
View on GitHub
Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.
☆116Jan 6, 2024Updated 2 years ago
potsawee / selfcheckgpt
View on GitHub
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆628Jun 26, 2024Updated 2 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
shmsw25 / FActScore
View on GitHub
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆450Apr 13, 2025Updated last year
Nanami18 / Snowballed_Hallucination
View on GitHub
☆43Sep 3, 2024Updated last year
HKUST-KnowComp / Knowledge-Constrained-Decoding
View on GitHub
Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detec…
☆30Nov 14, 2023Updated 2 years ago
HKUST-KnowComp / AbsPyramid
View on GitHub
Official code repository for the paper: AbsPyramid: Benchmarking the Abstration Ability of Language Models with a Unified Entailment Grap…
☆13Oct 30, 2024Updated last year
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RUCAIBox / HaluEval-2.0
View on GitHub
☆50Jan 7, 2024Updated 2 years ago
psunlpgroup / ReaLMistake
View on GitHub
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆32Aug 18, 2024Updated last year
jonnypei / acl23-preadd
View on GitHub
☆12Jul 25, 2023Updated 3 years ago
Yale-LILY / ROSE
View on GitHub
☆41Jun 7, 2023Updated 3 years ago
lingo-mit / lm-truthfulness
View on GitHub
☆17Dec 21, 2023Updated 2 years ago
XiaojuanTang / ICSR
View on GitHub
implementation of paper "Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners"
☆20Aug 17, 2023Updated 2 years ago
armingh2000 / FactScoreLite
View on GitHub
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu…
☆14Apr 25, 2024Updated 2 years ago
abhika-m / FAVA
View on GitHub
☆77Feb 16, 2024Updated 2 years ago
balevinstein / Probes
View on GitHub
☆58Jun 30, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
eth-sri / ChatProtect
View on GitHub
This is the code for the paper "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation".
☆38Apr 15, 2026Updated 3 months ago
yale-nlp / ODSum
View on GitHub
Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"
☆11Sep 20, 2024Updated last year
microsoft / HaDes
View on GitHub
Token-level Reference-free Hallucination Detection
☆97Jul 25, 2023Updated 3 years ago
yale-nlp / InstruSum
View on GitHub
☆23Feb 26, 2024Updated 2 years ago
hkust-nlp / PEM_composition
View on GitHub
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Nov 26, 2023Updated 2 years ago
EdinburghNLP / awesome-hallucination-detection
View on GitHub
List of papers on hallucination detection in LLMs.
☆1,119Updated this week
yhcc / utcie
View on GitHub
This is the code repo for the paper <UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction>
☆15Aug 10, 2023Updated 2 years ago
allenai / better-promptability
View on GitHub
☆11Nov 27, 2022Updated 3 years ago
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆600Dec 9, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
snu-mllab / Bayesian-Red-Teaming
View on GitHub
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Jul 9, 2023Updated 3 years ago
zjunlp / FactCHD
View on GitHub
[IJCAI 2024] FactCHD: Benchmarking Fact-Conflicting Hallucination Detection
☆90Apr 28, 2024Updated 2 years ago
hkust-nlp / deepsearch-tts
View on GitHub
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
☆21Oct 8, 2025Updated 9 months ago
ShujinWu-0814 / MACAROON
View on GitHub
Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"
☆14Sep 28, 2024Updated last year
141forever / DiaHalu
View on GitHub
This is the repository for the paper 'DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models' (EMNLP2024 …
☆18Apr 5, 2025Updated last year
chaitanyamalaviya / ExpertQA
View on GitHub
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆139Mar 14, 2024Updated 2 years ago
nayeon7lee / FactualityPrompt
View on GitHub
☆90Nov 11, 2022Updated 3 years ago