CarperAI/decontamination

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CarperAI/decontamination)

CarperAI / decontamination

This repository contains code for cleaning your training data of benchmark data to help combat data snooping.

☆28

Alternatives and similar repositories for decontamination

Users that are interested in decontamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vaguenebula / AlpacaDataReflect
View on GitHub
An experiment to see if chatgpt can improve the output of the stanford alpaca dataset
☆12Mar 29, 2023Updated 3 years ago
yoichi1484 / subspace
View on GitHub
An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)
☆10May 31, 2024Updated 2 years ago
AnswerDotAI / playwrightnb
View on GitHub
Use sync mode Playwright interactively, inside a Jupyter notebook
☆20Updated this week
facebookresearch / lss_eval
View on GitHub
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Aug 25, 2023Updated 2 years ago
Shopify / torch-grammar
View on GitHub
☆68Jan 26, 2026Updated 5 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
CarperAI / squeakily
View on GitHub
A library for squeakily cleaning and filtering language datasets.
☆50Jul 10, 2023Updated 3 years ago
ntunlp / xCodeEval
View on GitHub
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
☆90Sep 17, 2024Updated last year
masora1030 / eigoyurusan
View on GitHub
To be readable without enhancing english power.
☆10Jul 22, 2020Updated 5 years ago
wietsedv / gpt2-recycle
View on GitHub
As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)
☆48Aug 2, 2021Updated 4 years ago
Atulvermaon18 / RASA_CHATBOT
View on GitHub
AI assistance
☆12Jan 5, 2023Updated 3 years ago
iwiwi / epochraft
View on GitHub
Checkpointable dataset utilities for foundation model training
☆32Jan 29, 2024Updated 2 years ago
AnswerDotAI / pythonrunscript
View on GitHub
Run python scripts, auto-installing their dependencies in cached, isolated environments
☆32Oct 8, 2024Updated last year
LoryPack / LLM-LieDetector
View on GitHub
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆74Jun 19, 2024Updated 2 years ago
ChrisHayduk / qlora-multi-gpu
View on GitHub
QLoRA with Enhanced Multi GPU Support
☆38Aug 8, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
lowRISC / container-hotplug
View on GitHub
Hot-plug devices into a Docker container as they are plugged.
☆16Nov 18, 2025Updated 8 months ago
kyegomez / Ocean
View on GitHub
Ultra Fast Multi-Modality Vector Database
☆18Feb 21, 2024Updated 2 years ago
AroopN / LeetCode-scraping
View on GitHub
Scraping Leetcode using selenium to get upvotes and downvotes in each question
☆15Jan 15, 2020Updated 6 years ago
theblackcat102 / evol-dataset
View on GitHub
evol augment any dataset online
☆61Aug 3, 2023Updated 2 years ago
davidbrochart / ipyhtmx
View on GitHub
Build modern UIs in Jupyter with Python
☆12Dec 28, 2022Updated 3 years ago
euclaise / SlimTrainer
View on GitHub
Full finetuning of large language models without large memory requirements
☆92Sep 22, 2025Updated 9 months ago
Nitrokey / qubes-oem
View on GitHub
☆14Mar 3, 2026Updated 4 months ago
philschmid / huggingface-container
View on GitHub
☆10Dec 15, 2022Updated 3 years ago
nateraw / hf-hub-lightning
View on GitHub
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️
☆35May 13, 2022Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sap-ient-ai / FFF
View on GitHub
FastFeedForward Networks
☆20Dec 8, 2023Updated 2 years ago
notarussianteenager / srf-attention
View on GitHub
Simplex Random Feature attention, in PyTorch
☆76Oct 10, 2023Updated 2 years ago
AblateIt / finetune-study
View on GitHub
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆82Sep 10, 2023Updated 2 years ago
AnswerDotAI / toolslm
View on GitHub
Tools to make language models a bit easier to use
☆67Updated this week
kotoba-tech / kotoba-recipes
View on GitHub
Support Continual pre-training & Instruction Tuning forked from llama-recipes
☆34Feb 17, 2024Updated 2 years ago
fastai / docments
View on GitHub
Document parameters using comments
☆10Aug 6, 2021Updated 4 years ago
VikParuchuri / textbook_quality
View on GitHub
Generate textbook-quality synthetic LLM pretraining data
☆508Oct 19, 2023Updated 2 years ago
taylorai / galactic
View on GitHub
data cleaning and curation for unstructured text
☆329Aug 6, 2024Updated last year
zxqfl / flag
View on GitHub
☆18Dec 1, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
wantedly / intern-info
View on GitHub
Wantedlyのインターン情報や新卒採用についてのインフォメーションです
☆11Apr 5, 2022Updated 4 years ago
johnrobinsn / alpaca_lora_30b_4bit
View on GitHub
☆18Apr 3, 2023Updated 3 years ago
AI-Maker-Space / FastAPI-LLM-Model-Serving
View on GitHub
How to quickly serve an LLM using Fast API, Celery, and Redis
☆17Aug 29, 2023Updated 2 years ago
YuheiNakasaka / sb2md-rs
View on GitHub
☆11Jul 4, 2022Updated 4 years ago
sanand0 / uv-mega
View on GitHub
uv - MEGA. Make Environments Great Again (talk)
☆11Feb 22, 2025Updated last year
isucon / isucon12-prior
View on GitHub
☆10Jun 8, 2022Updated 4 years ago
AnswerDotAI / py-smi
View on GitHub
Convenient access to `pynvml` (the library behind `nvidia-smi`)
☆23Oct 18, 2024Updated last year