ntunlp / LLMSanitizeLinks

An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).

☆57

Alternatives and similar repositories for LLMSanitize

Users that are interested in LLMSanitize are comparing it to the libraries listed below

Sorting:

epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆98Updated 2 weeks ago
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆51Updated last year
shizhediao / R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆114Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆127Updated last year
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆59Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆72Updated 8 months ago
google-research-datasets / GSM-IC
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…
☆60Updated 2 years ago
zthang / Focus
☆20Updated last year
GAIR-NLP / alignment-for-honesty
☆74Updated last year
yinzhangyue / SelfAware
Do Large Language Models Know What They Don’t Know?
☆99Updated 8 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆62Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆67Updated last year
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆56Updated last year
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆78Updated last year
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆131Updated 2 years ago
swj0419 / in-context-pretraining
☆53Updated last year
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆114Updated 10 months ago
Nanami18 / Snowballed_Hallucination
☆45Updated 11 months ago
wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆41Updated 2 years ago
qinyiwei / InfoBench
☆55Updated 11 months ago
edenbiran / HoppingTooLate
Exploring the Limitations of Large Language Models on Multi-Hop Queries
☆27Updated 5 months ago
HKUNLP / icl-ceil
[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.
☆102Updated 2 years ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆29Updated last year
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
Zce1112zslx / IKE
☆41Updated last year
nayeon7lee / FactualityPrompt
☆87Updated 2 years ago
swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆228Updated last year