The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆82Apr 11, 2024Updated 2 years ago
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆14Jan 9, 2024Updated 2 years ago
- ☆22Dec 18, 2024Updated last year
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆66Apr 18, 2023Updated 3 years ago
- ☆11Jan 3, 2023Updated 3 years ago
- Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"☆17Mar 29, 2024Updated 2 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- ☆16Mar 9, 2018Updated 8 years ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- ☆13Dec 12, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- EACL 2017☆26Apr 22, 2018Updated 7 years ago
- Code for Navigating Connected Memories with a Task-oriented Dialog System☆17Dec 12, 2022Updated 3 years ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆23May 1, 2022Updated 3 years ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Sep 23, 2023Updated 2 years ago
- Code associated with the paper "Inducing brain-relevant bias in natural language processing models" in the proceedings of the 33rd Confer…☆13Nov 13, 2019Updated 6 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- Code base for ACL 2021 paper, Weakly Supervised Named Entity Tagging with Learnable Logical Rules.☆20Jun 27, 2023Updated 2 years ago
- ☆21Sep 5, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 3 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Aug 15, 2023Updated 2 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- Instruct-tuning LLaMA on consumer hardware with machine-translated data☆19Apr 17, 2023Updated 3 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Repository for ACL2021 paper: <Zero-shot Event Extraction via Transfer Learning: Challenges and Insights>.☆30Jan 5, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆191Oct 12, 2023Updated 2 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- [2023 ASE] GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction☆23May 19, 2023Updated 2 years ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆62Jan 22, 2022Updated 4 years ago
- Submission Guide + Discussion Board for AI Singapore Online Safety Prize Challenge☆14Mar 20, 2024Updated 2 years ago
- ☆12Feb 11, 2026Updated 2 months ago
- Neural Program Repair with Execution-based Backpropagation http://arxiv.org/pdf/2105.04123☆25Dec 19, 2022Updated 3 years ago