swj0419/detect-pretrain-code-contamination

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/swj0419/detect-pretrain-code-contamination)

swj0419 / detect-pretrain-code-contamination

☆78

Alternatives and similar repositories for detect-pretrain-code-contamination

Users that are interested in detect-pretrain-code-contamination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KyujinHan / Sakura-SOLAR-DPO
View on GitHub
Sakura-SOLAR-DPO: Merge, SFT, and DPO
☆116Dec 30, 2023Updated 2 years ago
QuixiAI / laserRMT
View on GitHub
This is our own implementation of 'Layer Selective Rank Reduction'
☆240May 26, 2024Updated 2 years ago
QuixiAI / generate
View on GitHub
☆27Mar 13, 2024Updated 2 years ago
kh-kim / nlp-express-practice
View on GitHub
☆10Jan 20, 2024Updated 2 years ago
ilur98 / DGQ
View on GitHub
Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Dec 27, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,261Jun 17, 2026Updated last month
VAGOsolutions / sauerkrautlm-colpali
View on GitHub
☆16Mar 1, 2026Updated 4 months ago
cisnlp / GlotCC
View on GitHub
[NeurIPS 2024] 🕸 GlotCC Dataset and Pipline
☆21Apr 6, 2025Updated last year
felix01189 / SEED
View on GitHub
☆14Jan 31, 2025Updated last year
applicaai / CCpdf
View on GitHub
Index of URLs to pdf files all over the internet and scripts
☆25May 2, 2023Updated 3 years ago
serp-ai / Parameter-Efficient-MoE
View on GitHub
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆31May 22, 2024Updated 2 years ago
QuixiAI / extract-expert
View on GitHub
Extract a single expert from a Mixture Of Experts model using slerp interpolation.
☆19May 26, 2024Updated 2 years ago
lm-sys / llm-decontaminator
View on GitHub
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆325Dec 20, 2023Updated 2 years ago
QuixiAI / spectrum
View on GitHub
☆145Aug 20, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Aratako / Task-Vector-Merge-Optimzier
View on GitHub
☆16Apr 11, 2024Updated 2 years ago
TRI-ML / linear_open_lm
View on GitHub
A repository for research on medium sized language models.
☆78May 23, 2024Updated 2 years ago
IVADL / tomato-disease-detector
View on GitHub
prototype of plant-disease-detector
☆10Apr 21, 2021Updated 5 years ago
Gryphe / BlockMerge_Gradient
View on GitHub
Merge Transformers language models by use of gradient parameters.
☆215Aug 8, 2024Updated last year
s-smits / grpo-optuna
View on GitHub
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆60Oct 18, 2025Updated 9 months ago
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆267Apr 23, 2024Updated 2 years ago
mlabonne / llm-autoeval
View on GitHub
Automatically evaluate your LLMs in Google Colab
☆695May 7, 2024Updated 2 years ago
jondurbin / qlora
View on GitHub
QLoRA: Efficient Finetuning of Quantized LLMs
☆79Apr 10, 2024Updated 2 years ago
arcee-ai / EvolKit
View on GitHub
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆258Oct 30, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
paust-team / pko-t5
View on GitHub
bpe based korean t5 model for text-to-text unified framework
☆63Apr 17, 2024Updated 2 years ago
sangHa0411 / Llama-Instruction-Tuning
View on GitHub
☆10Dec 28, 2023Updated 2 years ago
ictnlp / TACS
View on GitHub
Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts
☆17Sep 2, 2024Updated last year
jwjohns / LFM2Sloth
View on GitHub
Modular task agnostic training pipeline using LFM2 from Liquid AI with unsloth.
☆16Sep 13, 2025Updated 10 months ago
Jellyfish042 / uncheatable_eval
View on GitHub
Evaluating LLMs with Dynamic Data
☆117May 9, 2026Updated 2 months ago
stringandstickytape / MaxsAiStudio
View on GitHub
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆36May 11, 2026Updated 2 months ago
SempraETY / Pruning-via-Merging
View on GitHub
☆23Nov 26, 2024Updated last year
skywalker023 / sodaverse
View on GitHub
🥤🧑🏻‍🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization…
☆243Jan 23, 2026Updated 6 months ago
iantbutler01 / ditty
View on GitHub
A library for simplifying training with multi gpu setups in the HuggingFace / PyTorch ecosystem.
☆16Jun 10, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
brittlewis12 / autogguf
View on GitHub
Easily convert HuggingFace models to GGUF-format for llama.cpp
☆22Jul 27, 2024Updated last year
liziniu / cold_start_rl
View on GitHub
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆20Mar 9, 2025Updated last year
UCDvision / NOLA
View on GitHub
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆59Aug 25, 2024Updated last year
PratikSingh121 / Langchain
View on GitHub
My Langchain Code archive maybe
☆24Dec 25, 2023Updated 2 years ago
DjagbleyEmmanuel / llamafile-convert_gguf_UI
View on GitHub
This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…
☆14Jan 2, 2026Updated 6 months ago
kaistAI / Janus
View on GitHub
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆53Aug 10, 2025Updated 11 months ago
allenai / CommonGen-Eval
View on GitHub
Evaluating LLMs with CommonGen-Lite
☆95Mar 21, 2024Updated 2 years ago