lyy1994 / awesome-data-contaminationView external linksLinks
The Paper List on Data Contamination for Large Language Models Evaluation.
☆110Jan 29, 2026Updated 2 weeks ago
Alternatives and similar repositories for awesome-data-contamination
Users that are interested in awesome-data-contamination are comparing it to the libraries listed below
Sorting:
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆59Aug 13, 2024Updated last year
- ☆16Nov 26, 2024Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆82Apr 11, 2024Updated last year
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆38Feb 3, 2026Updated last week
- ☆19Oct 24, 2023Updated 2 years ago
- Code and data for NAACL 2025 paper "IHEval: Evaluating Language Models on Following the Instruction Hierarchy"☆16Feb 25, 2025Updated 11 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆72Jan 16, 2026Updated 3 weeks ago
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆14Jun 3, 2025Updated 8 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- Xlore2.0 Code[BaiduExtractor, HudongExtractor, WikiExtractor, XloreData, XloreWeb]☆12Apr 5, 2017Updated 8 years ago
- Latest Evaluation Toolkit (LatestEval). Assessing the language models with latest, uncontaminated materials.☆28Feb 17, 2025Updated 11 months ago
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆240Nov 3, 2023Updated 2 years ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Aug 15, 2024Updated last year
- ☆35May 9, 2025Updated 9 months ago
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆29Sep 12, 2025Updated 5 months ago
- Codes and data for EMNLP 2021 paper "Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Re…☆16Oct 15, 2022Updated 3 years ago
- Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.☆26Jun 9, 2025Updated 8 months ago
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆18Dec 23, 2024Updated last year
- ☆35Mar 10, 2025Updated 11 months ago
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆35Oct 23, 2024Updated last year
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated 10 months ago
- A bibliography and survey of the papers surrounding o1☆1,212Nov 16, 2024Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- Source code for the paper "CAT: Interpretable Concept-based Taylor Additive Models".☆18Aug 26, 2024Updated last year
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- Used for thinking process intervention of reasoning models such as DeepSeek-R1, effectively controlling the reasoning thinking process. 用…☆24Apr 14, 2025Updated 10 months ago
- Official code and dataset repository of KoBBQ (TACL 2024)☆19May 13, 2024Updated last year
- The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".☆17May 24, 2022Updated 3 years ago
- Benchmarking MIAs against LLMs.☆28Oct 8, 2024Updated last year
- Machine learning codes, including self-training codes☆18Jan 15, 2023Updated 3 years ago
- AskUp Search ChatGPT Plugin☆20May 27, 2023Updated 2 years ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models☆24Jul 22, 2024Updated last year
- Benchmarking Optimizers for LLM Pretraining☆50Dec 30, 2025Updated last month
- Lightweight tool to identify Data Contamination in LLMs evaluation☆53Mar 8, 2024Updated last year
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,667Updated this week
- ☆23Jul 5, 2024Updated last year