hitz-zentroa / lm-contaminationView external linksLinks
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆82Apr 11, 2024Updated last year
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below
Sorting:
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆13Jan 9, 2024Updated 2 years ago
- ☆23Dec 18, 2024Updated last year
- A repository for ACL 2022 paper "How do we answer complex questions: Discourse structure of long form answers"☆19May 31, 2025Updated 8 months ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆110Jan 29, 2026Updated 2 weeks ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆23Nov 23, 2022Updated 3 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated 10 months ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆66Apr 18, 2023Updated 2 years ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated 3 weeks ago
- Code base for ACL 2021 paper, Weakly Supervised Named Entity Tagging with Learnable Logical Rules.☆20Jun 27, 2023Updated 2 years ago
- EACL 2017☆26Apr 22, 2018Updated 7 years ago
- Official Repository for Dataset Inference for LLMs☆43Jul 25, 2024Updated last year
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆24May 1, 2022Updated 3 years ago
- Code for paper Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach by Zhe Lin, Xiaojun Wan. This…☆14Aug 10, 2021Updated 4 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- ☆11Jun 5, 2024Updated last year
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- ☆11Jan 3, 2023Updated 3 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆189Oct 12, 2023Updated 2 years ago
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- ☆12Updated this week
- Source Code for "Adapters for Enhanced Modeling of Multilingual Knowledge and Text"☆12Oct 28, 2022Updated 3 years ago
- ⚡️Lightweight framework for NLP research, based on PyTorch⚡️☆12Apr 5, 2023Updated 2 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆30Nov 25, 2021Updated 4 years ago
- ☆13Dec 12, 2025Updated 2 months ago
- ☆11Oct 3, 2021Updated 4 years ago
- ☆28May 4, 2023Updated 2 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- ☆12Mar 31, 2020Updated 5 years ago
- ☆12Jan 2, 2022Updated 4 years ago
- exBERT on Transformers🤗☆10Jun 14, 2021Updated 4 years ago