SalesforceAIResearch / FaithEvalLinks

☆54

Alternatives and similar repositories for FaithEval

Users that are interested in FaithEval are comparing it to the libraries listed below

Sorting:

shizhediao / R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆126Updated last year
OSU-NLP-Group / LLM-Knowledge-Conflict
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆78Updated last year
xlang-ai / BRIGHT
[ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
☆179Updated 2 months ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆134Updated last year
ParticleMedia / RAGTruth
Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"
☆215Updated last year
DataArcTech / LLM-as-a-Judge
☆158Updated last month
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated last year
weizhepei / InstructRAG
[ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
☆132Updated 10 months ago
ZitongYang / Synthetic_Continued_Pretraining
Code implementation of synthetic continued pretraining
☆142Updated 11 months ago
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆52Updated last year
texttron / BrowseComp-Plus
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
☆123Updated last month
oriyor / ret-robust
Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"
☆75Updated last year
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆130Updated last year
zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆159Updated 3 weeks ago
Anni-Zou / DocBench
DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
☆59Updated last year
declare-lab / trust-align
Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…
☆68Updated 9 months ago
ScalerLab / JudgeBench
☆105Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆123Updated last year
kevinwu23 / StanfordClashEval
☆37Updated 10 months ago
abhika-m / FAVA
☆75Updated last year
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆110Updated 9 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated last year
Zayne-sprague / MuSR
☆56Updated last year
gankim / tree-of-clarifications
🌲 Code for our EMNLP 2023 paper - 🎄 "Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Mode…
☆52Updated 2 years ago
yueyu1030 / AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
☆156Updated 2 years ago
shengliu66 / ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆193Updated 9 months ago
jlko / long_hallucinations
Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).
☆76Updated last year
abertsch72 / long-context-icl
Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"
☆41Updated last year
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Updated last year