hitz-zentroa/lm-contamination

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hitz-zentroa/lm-contamination)

hitz-zentroa / lm-contamination

The LM Contamination Index is a manually created database of contamination evidences for LMs.

☆81

Alternatives and similar repositories for lm-contamination

Users that are interested in lm-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ikergarcia1996 / T-Projection
View on GitHub
T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.
☆13Nov 21, 2023Updated 2 years ago
juletx / self-translate
View on GitHub
Do Multilingual Language Models Think Better in English?
☆42Aug 3, 2023Updated 2 years ago
martiansideofthemoon / relic-retrieval
View on GitHub
Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).
☆20May 14, 2022Updated 4 years ago
SimengSun / ChapterBreak
View on GitHub
☆12Jun 5, 2024Updated 2 years ago
sail-sg / symbolic-instruction-tuning
View on GitHub
The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".
☆65Apr 18, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
LLM360 / TxT360
View on GitHub
☆25Dec 18, 2024Updated last year
BBN-E / ZS4IE
View on GitHub
ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizations
☆29Mar 28, 2022Updated 4 years ago
lyy1994 / awesome-data-contamination
View on GitHub
The Paper List on Data Contamination for Large Language Models Evaluation.
☆117Jun 2, 2026Updated last month
iai-group / table-retrieval
View on GitHub
☆11Jan 3, 2023Updated 3 years ago
swiseman / neighbor-splicing
View on GitHub
☆11Jan 2, 2022Updated 4 years ago
huhailinguist / ChineseNLIProbing
View on GitHub
☆10Oct 17, 2021Updated 4 years ago
StefanHeng / ProgGen
View on GitHub
Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"
☆17Mar 29, 2024Updated 2 years ago
karlstratos / ammi
View on GitHub
☆11Jul 15, 2020Updated 6 years ago
qiangning / StructTempRel-EMNLP17
View on GitHub
☆16Mar 9, 2018Updated 8 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
RamonYeung / torchlight
View on GitHub
⚡️Lightweight framework for NLP research, based on PyTorch⚡️
☆12Apr 5, 2023Updated 3 years ago
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
facebookresearch / comet_memory_dialog
View on GitHub
Code for Navigating Connected Memories with a Task-oriented Dialog System
☆18Dec 12, 2022Updated 3 years ago
INK-USC / ReCross
View on GitHub
ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation
☆23May 1, 2022Updated 4 years ago
neelsjain / BYOD
View on GitHub
The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"
☆108Sep 23, 2023Updated 2 years ago
wyu97 / RACo
View on GitHub
Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.
☆24Nov 23, 2022Updated 3 years ago
danrsc / bert_brain_neurips_2019
View on GitHub
Code associated with the paper "Inducing brain-relevant bias in natural language processing models" in the proceedings of the 33rd Confer…
☆13Nov 13, 2019Updated 6 years ago
thompsonb / prism
View on GitHub
MT Evaluation in Many Languages via Zero-Shot Paraphrasing
☆102Jul 25, 2024Updated 2 years ago
Adaxry / Post-Instruction
View on GitHub
☆21Sep 5, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
jwieting / paraphrastic-representations-at-scale
View on GitHub
☆74Jul 2, 2021Updated 5 years ago
shtoshni / g2p
View on GitHub
Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models
☆15Feb 20, 2019Updated 7 years ago
Unbabel / smaug
View on GitHub
Python package to augment multilingual data
☆15Feb 15, 2023Updated 3 years ago
Andrewzh112 / AI-Research-Interview-Lab
View on GitHub
☆31Nov 14, 2025Updated 8 months ago
nlp-uoregon / Okapi
View on GitHub
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
☆96Aug 18, 2023Updated 2 years ago
MikeWangWZHL / Zemi
View on GitHub
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆15May 3, 2023Updated 3 years ago
hitz-zentroa / latxa
View on GitHub
Latxa: An Open Language Model and Evaluation Suite for Basque
☆36Dec 15, 2025Updated 7 months ago
sairin1202 / SciXGen
View on GitHub
Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"
☆13Feb 14, 2022Updated 4 years ago
ASSERT-KTH / RewardRepair
View on GitHub
Neural Program Repair with Execution-based Backpropagation http://arxiv.org/pdf/2105.04123
☆25Dec 19, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
juletx / alpaca-lora-mt
View on GitHub
Instruct-tuning LLaMA on consumer hardware with machine-translated data
☆19Apr 17, 2023Updated 3 years ago
cxcscmu / MATES
View on GitHub
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆80Nov 14, 2024Updated last year
tangjialong / Knowledge-Projection-for-ERE
View on GitHub
Source codes for #ACL2021 paper "From Discourse to Narrative: Knowledge Projection for Event Relation Extraction"
☆19Oct 22, 2022Updated 3 years ago
shauryr / ACL-anthology-corpus
View on GitHub
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
☆192Oct 12, 2023Updated 2 years ago
veronica320 / Zeroshot-Event-Extraction
View on GitHub
Repository for ACL2021 paper: <Zero-shot Event Extraction via Transfer Learning: Challenges and Insights>.
☆30Jan 5, 2023Updated 3 years ago
AISG-Technology-Team / AISG-Online-Safety-Challenge-Submission-Guide
View on GitHub
Submission Guide + Discussion Board for AI Singapore Online Safety Prize Challenge
☆14Mar 20, 2024Updated 2 years ago
r-three / realistic_evaluation_of_model_merging_for_compositional_generalization
View on GitHub
☆13Feb 11, 2026Updated 5 months ago