The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆82Apr 11, 2024Updated 2 years ago
Alternatives and similar repositories for lm-contamination
Users that are interested in lm-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆14Jan 9, 2024Updated 2 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆66Apr 18, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆12Jun 5, 2024Updated last year
- ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizations☆29Mar 28, 2022Updated 4 years ago
- ☆11Jan 3, 2023Updated 3 years ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆115Jan 29, 2026Updated 3 months ago
- ☆11Jan 2, 2022Updated 4 years ago
- Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"☆17Mar 29, 2024Updated 2 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids☆18Feb 19, 2025Updated last year
- ☆11Jul 15, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Feb 4, 2023Updated 3 years ago
- A repository for ACL 2022 paper "How do we answer complex questions: Discourse structure of long form answers"☆19May 31, 2025Updated 11 months ago
- ☆13Dec 12, 2025Updated 4 months ago
- EACL 2017☆26Apr 22, 2018Updated 8 years ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆23May 1, 2022Updated 4 years ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Sep 23, 2023Updated 2 years ago
- Code associated with the paper "Inducing brain-relevant bias in natural language processing models" in the proceedings of the 33rd Confer…☆13Nov 13, 2019Updated 6 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- ☆12Mar 31, 2020Updated 6 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- Code base for ACL 2021 paper, Weakly Supervised Named Entity Tagging with Learnable Logical Rules.☆20Jun 27, 2023Updated 2 years ago
- ☆21Sep 5, 2023Updated 2 years ago
- Efficient APR with LLMs http://arxiv.org/pdf/2402.06598☆16May 28, 2024Updated last year
- ☆75Jul 2, 2021Updated 4 years ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆96Aug 18, 2023Updated 2 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 3 years ago
- homepage for proFL☆23Apr 26, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Aug 15, 2023Updated 2 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 3 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Repository for ACL2021 paper: <Zero-shot Event Extraction via Transfer Learning: Challenges and Insights>.☆30Jan 5, 2023Updated 3 years ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆192Oct 12, 2023Updated 2 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- [2023 ASE] GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction☆23May 19, 2023Updated 2 years ago