This repository presents the original implementation of Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method by Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
☆22May 21, 2025Updated 9 months ago
Alternatives and similar repositories for DC-PDD
Users that are interested in DC-PDD are comparing it to the libraries listed below
Sorting:
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆52May 26, 2025Updated 9 months ago
- ☆22Dec 22, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 8 months ago
- EARAM for fake news detection☆13Dec 30, 2025Updated 2 months ago
- FTRL-Proximal Online Learning Algorithm☆15May 22, 2017Updated 8 years ago
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- ☆12Dec 14, 2024Updated last year
- ☆16Sep 4, 2025Updated 6 months ago
- EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets☆10Dec 12, 2023Updated 2 years ago
- ICML2025: One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework☆14Jun 24, 2025Updated 8 months ago
- ☆11Nov 13, 2024Updated last year
- ☆11Nov 17, 2024Updated last year
- 新词发现分布式机器学习算法。☆15Jul 21, 2014Updated 11 years ago
- Encoder-decoders for translating different chemical formats.☆19Sep 17, 2025Updated 5 months ago
- ACL24☆11Jun 7, 2024Updated last year
- Causal Reasoning for Membership Inference Attacks☆11Oct 21, 2022Updated 3 years ago
- Implementation for EACL 2024 paper "Corpus-Steered Query Expansion with Large Language Models"☆12Mar 19, 2024Updated last year
- Code for the paper "Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction" …☆12Sep 6, 2023Updated 2 years ago
- Learning from Indirect Observations☆11Jul 16, 2021Updated 4 years ago
- Source code of "Multimodal Matching-aware Co-attention Networks with Mutual Knowledge Distillation for Fake News Detection"☆13Nov 17, 2023Updated 2 years ago
- ☆15Apr 4, 2024Updated last year
- ☆12Sep 26, 2024Updated last year
- ☆14Dec 12, 2024Updated last year
- Mini Model Daemon☆12Nov 9, 2024Updated last year
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- A library for structural-semantic chunking of documents.☆12Oct 8, 2025Updated 5 months ago
- Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"☆12Jul 25, 2024Updated last year
- Official implementation of Panacea: A foundation model for clinical trial design, recruitment, search, and summarization.☆18Dec 24, 2024Updated last year
- allowing R users to work with dlib through Rcpp☆13Apr 11, 2018Updated 7 years ago
- ☆10Dec 20, 2023Updated 2 years ago
- This project has included related source codes and datasets of our EMNLP2021 paper☆10May 28, 2022Updated 3 years ago
- Classification with PyTorch.☆10Feb 22, 2021Updated 5 years ago
- Audio-only Emotion Detection using Federated Learning☆10Dec 8, 2022Updated 3 years ago
- [WWW 25] USPTO-LLM: A Large Language Model-Assisted Information-enriched Chemical Reaction Dataset☆16Dec 12, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- Shadow Attack, LiRA, Quantile Regression and RMIA implementations in PyTorch (Online version)☆14Nov 8, 2024Updated last year
- ☆10Jun 19, 2024Updated last year
- ☆11Dec 22, 2021Updated 4 years ago
- ☆11Jun 4, 2021Updated 4 years ago