An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆60Aug 13, 2024Updated last year
Alternatives and similar repositories for LLMSanitize
Users that are interested in LLMSanitize are comparing it to the libraries listed below
Sorting:
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆110Jan 29, 2026Updated last month
- ☆23Jul 5, 2024Updated last year
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- ☆33Feb 15, 2026Updated 2 weeks ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆34Nov 8, 2024Updated last year
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- [ACL 2023] Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation☆14Jul 11, 2023Updated 2 years ago
- ☆16Nov 26, 2024Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- Website for HKU NLP group (under construction)☆14Dec 23, 2025Updated 2 months ago
- ☆15Aug 18, 2022Updated 3 years ago
- [ICML 2024] Self-Infilling Code Generation☆18May 5, 2024Updated last year
- ☆23Jun 30, 2025Updated 8 months ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated 11 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- ☆19Jul 24, 2025Updated 7 months ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 10 months ago
- Repository for the EMNLP 2023 Demo Paper "Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data"☆19Jan 27, 2025Updated last year
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆242Nov 3, 2023Updated 2 years ago
- EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation☆41Oct 19, 2022Updated 3 years ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".☆17May 24, 2022Updated 3 years ago
- We construct and introduce DIALFACT, a testing benchmark dataset crowd-annotated conversational claims, paired with pieces of evidence fr…☆44Oct 12, 2022Updated 3 years ago
- ☆21May 24, 2024Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- ☆19Nov 6, 2023Updated 2 years ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆24Jan 29, 2024Updated 2 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation☆27Jun 7, 2024Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Mar 20, 2023Updated 2 years ago
- ☆21Oct 26, 2021Updated 4 years ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆62Oct 21, 2024Updated last year
- PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialog…☆29Oct 4, 2021Updated 4 years ago
- A MBTI test on Large Language Model like GPT-3.☆27May 2, 2022Updated 3 years ago
- Exploring Few-Shot Adaptation of Language Models with Tables☆24Aug 22, 2022Updated 3 years ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆24May 1, 2022Updated 3 years ago