The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆81Apr 11, 2024Updated 2 years ago
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 4 years ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆65Apr 18, 2023Updated 3 years ago
- ☆12Jun 5, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizations☆29Mar 28, 2022Updated 4 years ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆114Jan 29, 2026Updated 4 months ago
- ☆11Jan 2, 2022Updated 4 years ago
- Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"☆17Mar 29, 2024Updated 2 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids☆18Feb 19, 2025Updated last year
- ☆11Jul 15, 2020Updated 5 years ago
- ⚡️Lightweight framework for NLP research, based on PyTorch⚡️☆12Apr 5, 2023Updated 3 years ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A repository for ACL 2022 paper "How do we answer complex questions: Discourse structure of long form answers"☆19May 31, 2025Updated 11 months ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- ☆13Dec 12, 2025Updated 5 months ago
- EACL 2017☆26Apr 22, 2018Updated 8 years ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆23May 1, 2022Updated 4 years ago
- Code for Navigating Connected Memories with a Task-oriented Dialog System☆17Dec 12, 2022Updated 3 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- ☆12Mar 31, 2020Updated 6 years ago
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code base for ACL 2021 paper, Weakly Supervised Named Entity Tagging with Learnable Logical Rules.☆20Jun 27, 2023Updated 2 years ago
- ☆21Sep 5, 2023Updated 2 years ago
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- Efficient APR with LLMs http://arxiv.org/pdf/2402.06598☆16May 28, 2024Updated 2 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 3 years ago
- homepage for proFL☆23Apr 26, 2021Updated 5 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆30Nov 25, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Aug 15, 2023Updated 2 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 3 years ago
- Instruct-tuning LLaMA on consumer hardware with machine-translated data☆19Apr 17, 2023Updated 3 years ago
- Repository for ACL2021 paper: <Zero-shot Event Extraction via Transfer Learning: Challenges and Insights>.☆30Jan 5, 2023Updated 3 years ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆192Oct 12, 2023Updated 2 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- [2023 ASE] GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction☆23May 19, 2023Updated 3 years ago