This repository presents the original implementation of Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method by Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
☆23May 21, 2025Updated 10 months ago
Alternatives and similar repositories for DC-PDD
Users that are interested in DC-PDD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆55May 26, 2025Updated 10 months ago
- ☆23Dec 22, 2024Updated last year
- 大数据应用分类标注挑战赛(NLP),亚军🥈☆20Mar 24, 2023Updated 3 years ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 10 months ago
- ☆10Jun 19, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆10Oct 27, 2023Updated 2 years ago
- Source code of "Multimodal Matching-aware Co-attention Networks with Mutual Knowledge Distillation for Fake News Detection"☆14Nov 17, 2023Updated 2 years ago
- ☆11Nov 11, 2021Updated 4 years ago
- Unofficial Pytorch implementation of the paper 'Categorical Reparameterization with Gumbel-Softmax' and 'The Concrete Distribution: A Con…☆11Apr 27, 2021Updated 4 years ago
- [ACL 2023] Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation.☆10Dec 19, 2024Updated last year
- ☆18Feb 20, 2024Updated 2 years ago
- EARAM for fake news detection☆14Dec 30, 2025Updated 3 months ago
- Code for the paper "Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction" …☆12Sep 6, 2023Updated 2 years ago
- Mini Model Daemon☆13Nov 9, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Learning from Indirect Observations☆11Jul 16, 2021Updated 4 years ago
- ☆152Apr 16, 2024Updated 2 years ago
- ✨ Official code for our paper: "Uncertainty-o: One Model-agnostic Framework for Unveiling Epistemic Uncertainty in Large Multimodal Model…☆20Mar 13, 2025Updated last year
- [ACL 2026] R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning☆28Jan 4, 2026Updated 3 months ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- ☆12Dec 14, 2024Updated last year
- 利用BERT预训练模型进行文本生成,可用于对话、摘要、问题生成等任务。 目前支持策略,词表的插入和删除、自定义Character Embedding、随机词替换等☆10Jun 1, 2022Updated 3 years ago
- The repository provides code for the paper RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders, CIKM'24☆11Oct 21, 2024Updated last year
- Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM | EMNLP 2025 Findings☆18Oct 17, 2025Updated 6 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Audio-only Emotion Detection using Federated Learning☆10Dec 8, 2022Updated 3 years ago
- Code for our EMNLP 2023 paper - Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Mode…☆15May 5, 2024Updated last year
- ☆23Aug 7, 2023Updated 2 years ago
- Official PyTorch implementation of NeuralSVD (ICML 2024)☆22Sep 14, 2024Updated last year
- Causal Reasoning for Membership Inference Attacks☆11Oct 21, 2022Updated 3 years ago
- ☆15Apr 4, 2024Updated 2 years ago
- E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition☆25Jul 18, 2023Updated 2 years ago
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- Official implementation of Panacea: A foundation model for clinical trial design, recruitment, search, and summarization.☆18Dec 24, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 4 months ago
- ☆26Dec 8, 2025Updated 4 months ago
- Mathematical Analysis (et analyse fonctionnelle)☆14Feb 1, 2022Updated 4 years ago
- ☆16Sep 4, 2025Updated 7 months ago
- ☆18Mar 2, 2026Updated last month
- Self-Teaching Notes on Gradient Leakage Attacks against GPT-2 models.☆15Mar 18, 2024Updated 2 years ago
- The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)☆16May 21, 2024Updated last year