swj0419 / detect-pretrain-codeLinks

This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu , Terra Blevins , Danqi Chen , Luke Zettlemoyer.

☆228

Alternatives and similar repositories for detect-pretrain-code

Users that are interested in detect-pretrain-code are comparing it to the libraries listed below

Sorting:

PKU-Alignment / beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
☆151Updated last year
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆98Updated 2 weeks ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆183Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆127Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆114Updated last year
ntunlp / LLMSanitize
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆57Updated 11 months ago
shizhediao / R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆114Updated last year
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆245Updated 3 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Updated 9 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆166Updated 2 months ago
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆156Updated 4 months ago
xiaoya-li / Instruction-Tuning-Survey
Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`
☆180Updated 8 months ago
vr25 / hallucination-foundation-model-survey
A Survey of Hallucination in Large Foundation Models
☆54Updated last year
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆51Updated last year
tatsu-lab / test_set_contamination
☆38Updated last year
GAIR-NLP / auto-j
Generative Judge for Evaluating Alignment
☆244Updated last year
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆140Updated 3 months ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆72Updated 8 months ago
SuperBruceJia / Awesome-LLM-Self-Consistency
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆105Updated 2 weeks ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
icip-cas / awesome-auto-alignment
Collection of papers for scalable automated alignment.
☆93Updated 9 months ago
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆123Updated 3 weeks ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆114Updated 10 months ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆504Updated 6 months ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆128Updated last year
Hunter-DDM / knowledge-neurons
Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"
☆170Updated last year
nayeon7lee / FactualityPrompt
☆87Updated 2 years ago
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆105Updated 5 months ago