GAIR-NLP/factool

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GAIR-NLP/factool)

GAIR-NLP / factool

FacTool: Factuality Detection in Generative AI

☆933

Alternatives and similar repositories for factool

Users that are interested in factool are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
GAIR-NLP / alignment-for-honesty
View on GitHub
☆78May 22, 2024Updated 2 years ago
RUCAIBox / HaluEval
View on GitHub
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆592Feb 12, 2024Updated 2 years ago
shmsw25 / FActScore
View on GitHub
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆450Apr 13, 2025Updated last year
OpenMOSS / HalluQA
View on GitHub
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆139Jun 5, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yuxiaw / Factcheck-GPT
View on GitHub
Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.
☆116Jan 6, 2024Updated 2 years ago
neulab / prompt2model
View on GitHub
prompt2model - Generate Deployable Models from Natural Language Instructions
☆2,017Dec 29, 2024Updated last year
GAIR-NLP / auto-j
View on GitHub
Generative Judge for Evaluating Alignment
☆251Jan 18, 2024Updated 2 years ago
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,706May 21, 2025Updated last year
GAIR-NLP / OPO
View on GitHub
☆50Mar 2, 2024Updated 2 years ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
GAIR-NLP / Safety-J
View on GitHub
Safety-J: Evaluating Safety with Critique
☆16Jul 28, 2024Updated last year
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,597Feb 8, 2026Updated 5 months ago
GAIR-NLP / abel
View on GitHub
SOTA Math Opensource LLM
☆335Dec 12, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
GAIR-NLP / MetaCritique
View on GitHub
Evaluate the Quality of Critique
☆37Jun 1, 2024Updated 2 years ago
HillZhang1999 / llm-hallucination-survey
View on GitHub
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …
☆1,085Sep 27, 2025Updated 9 months ago
GAIR-NLP / MoPS
View on GitHub
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
☆46Jul 19, 2024Updated 2 years ago
MikeWangWZHL / Solo-Performance-Prompting
View on GitHub
Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"
☆352May 8, 2024Updated 2 years ago
google-deepmind / long-form-factuality
View on GitHub
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆692Jun 18, 2026Updated last month
thunlp / UltraChat
View on GitHub
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
☆2,874Mar 13, 2024Updated 2 years ago
GAIR-NLP / MathPile
View on GitHub
[NeurlPS D&B 2024] Generative AI for Math: MathPile
☆418Apr 4, 2025Updated last year
GAIR-NLP / scaleeval
View on GitHub
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Feb 15, 2024Updated 2 years ago
princeton-nlp / ALCE
View on GitHub
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
☆522Oct 9, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / ToRA
View on GitHub
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting wit…
☆1,119Feb 22, 2024Updated 2 years ago
potsawee / selfcheckgpt
View on GitHub
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆628Jun 26, 2024Updated 2 years ago
GanjinZero / RRHF
View on GitHub
[NIPS2023] RRHF & Wombat
☆805Sep 22, 2023Updated 2 years ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,611Nov 24, 2025Updated 8 months ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
AkariAsai / self-rag
View on GitHub
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…
☆2,410May 25, 2024Updated 2 years ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,151Jun 1, 2023Updated 3 years ago
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,957Apr 13, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
IBM / Dromedary
View on GitHub
Dromedary: towards helpful, ethical and reliable LLMs.
☆1,138Sep 18, 2025Updated 10 months ago
MLGroupJLU / LLM-eval-survey
View on GitHub
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
☆1,609Apr 17, 2026Updated 3 months ago
RCGAI / SimplyRetrieve
View on GitHub
Lightweight chat AI platform featuring custom knowledge, open-source LLMs, prompt-engineering, retrieval analysis. Highly customizable. F…
☆218Feb 14, 2024Updated 2 years ago
anthonywchen / RARR
View on GitHub
RARR: Researching and Revising What Language Models Say, Using Language Models
☆54Jun 22, 2023Updated 3 years ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,776Aug 4, 2024Updated last year
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year