The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆82Apr 11, 2024Updated last year
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- ☆23Dec 18, 2024Updated last year
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆66Apr 18, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆11Jun 5, 2024Updated last year
- ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizations☆29Mar 28, 2022Updated 4 years ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆112Jan 29, 2026Updated 2 months ago
- ☆11Jan 2, 2022Updated 4 years ago
- Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"☆17Mar 29, 2024Updated 2 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- ⚡️Lightweight framework for NLP research, based on PyTorch⚡️☆12Apr 5, 2023Updated 2 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A repository for ACL 2022 paper "How do we answer complex questions: Discourse structure of long form answers"☆19May 31, 2025Updated 9 months ago
- ☆16Mar 9, 2018Updated 8 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- ☆13Dec 12, 2025Updated 3 months ago
- EACL 2017☆26Apr 22, 2018Updated 7 years ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆23May 1, 2022Updated 3 years ago
- Code for Navigating Connected Memories with a Task-oriented Dialog System☆17Dec 12, 2022Updated 3 years ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Sep 23, 2023Updated 2 years ago
- Code associated with the paper "Inducing brain-relevant bias in natural language processing models" in the proceedings of the 33rd Confer…☆13Nov 13, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- ☆12Mar 31, 2020Updated 5 years ago
- ☆21Sep 5, 2023Updated 2 years ago
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- Efficient APR with LLMs http://arxiv.org/pdf/2402.06598☆16May 28, 2024Updated last year
- ☆75Jul 2, 2021Updated 4 years ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆96Aug 18, 2023Updated 2 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- homepage for proFL☆23Apr 26, 2021Updated 4 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆30Nov 25, 2021Updated 4 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆191Oct 12, 2023Updated 2 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- The source code and the data for ACL 2022 paper "Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Dat…☆14Apr 21, 2023Updated 2 years ago