potsawee / selfcheckgptLinks

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

☆583

Alternatives and similar repositories for selfcheckgpt

Users that are interested in selfcheckgpt are comparing it to the libraries listed below

Sorting:

RUCAIBox / HaluEval
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆532Updated last year
EdinburghNLP / awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
☆997Updated 3 weeks ago
nlpyang / geval
Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
☆397Updated last year
ParticleMedia / RAGTruth
Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"
☆215Updated last year
princeton-nlp / ALCE
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
☆501Updated last year
shmsw25 / FActScore
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆408Updated 7 months ago
nelson-liu / lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
☆365Updated last year
freshllms / freshqa
Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)
☆379Updated last week
amazon-science / RefChecker
RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Langua…
☆402Updated 6 months ago
night-chen / ToolQA
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆282Updated 2 years ago
Libr-AI / do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆297Updated last year
sylinrl / TruthfulQA
TruthfulQA: Measuring How Models Imitate Human Falsehoods
☆850Updated 10 months ago
jxzhangjhu / Awesome-LLM-Prompt-Optimization
Awesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Models
☆388Updated last year
madaan / self-refine
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
☆758Updated last year
glgh / awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
☆384Updated 2 years ago
castorini / rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆555Updated last week
likenneth / honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆560Updated 10 months ago
stanford-futuredata / ARES
Automated Evaluation of RAG Systems
☆676Updated 8 months ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆524Updated 10 months ago
ContextualAI / gritlm
Generative Representational Instruction Tuning
☆679Updated 5 months ago
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆1,017Updated 7 months ago
jlko / semantic_uncertainty
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
☆390Updated last year
yixuantt / MultiHop-RAG
Repository for "MultiHop-RAG: A Dataset for Evaluating Retrieval-Augmented Generation Across Documents" (COLM 2024)
☆396Updated 8 months ago
jzbjyb / FLARE
Forward-Looking Active REtrieval-augmented generation (FLARE)
☆659Updated 2 years ago
xfactlab / orpo
Official repository for ORPO
☆467Updated last year
google-deepmind / long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆656Updated 3 months ago
TIGER-AI-Lab / Program-of-Thoughts
Data and Code for Program of Thoughts [TMLR 2023]
☆300Updated last year
suzgunmirac / BIG-Bench-Hard
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆533Updated last year
HITsz-TMG / awesome-llm-attributions
A Survey of Attributions for Large Language Models
☆220Updated last year
AI21Labs / in-context-ralm
☆294Updated last year