The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆82Apr 11, 2024Updated last year
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below
Sorting:
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆13Jan 9, 2024Updated 2 years ago
- ☆23Dec 18, 2024Updated last year
- A repository for ACL 2022 paper "How do we answer complex questions: Discourse structure of long form answers"☆19May 31, 2025Updated 9 months ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated 11 months ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆66Apr 18, 2023Updated 2 years ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated last month
- Code base for ACL 2021 paper, Weakly Supervised Named Entity Tagging with Learnable Logical Rules.☆20Jun 27, 2023Updated 2 years ago
- EACL 2017☆26Apr 22, 2018Updated 7 years ago
- Official Repository for Dataset Inference for LLMs☆42Jul 25, 2024Updated last year
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆24May 1, 2022Updated 3 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- ☆11Jan 3, 2023Updated 3 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- ☆11Jun 5, 2024Updated last year
- Code for paper Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach by Zhe Lin, Xiaojun Wan. This…☆14Aug 10, 2021Updated 4 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆31Jan 31, 2023Updated 3 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- ☆10Dec 12, 2023Updated 2 years ago
- Source Code for "Adapters for Enhanced Modeling of Multilingual Knowledge and Text"☆12Oct 28, 2022Updated 3 years ago
- ⚡️Lightweight framework for NLP research, based on PyTorch⚡️☆12Apr 5, 2023Updated 2 years ago
- ☆13May 21, 2024Updated last year
- ☆12Feb 11, 2026Updated 3 weeks ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- ☆12Jan 2, 2022Updated 4 years ago
- Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.☆18Dec 7, 2022Updated 3 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆30Nov 25, 2021Updated 4 years ago
- exBERT on Transformers🤗☆10Jun 14, 2021Updated 4 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- ☆28May 4, 2023Updated 2 years ago
- ☆13Dec 12, 2025Updated 2 months ago