chentong0 / copy-bench
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
☆14Updated 8 months ago
Alternatives and similar repositories for copy-bench:
Users that are interested in copy-bench are comparing it to the libraries listed below
- LoFiT: Localized Fine-tuning on LLM Representations☆34Updated 2 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆47Updated 6 months ago
- ☆17Updated 3 weeks ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆17Updated 3 months ago
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆41Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 8 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆95Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆71Updated 3 weeks ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆37Updated last month
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 4 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆80Updated 6 months ago
- This repository contains data, code and models for contextual noncompliance.☆20Updated 8 months ago
- ☆42Updated last month
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆15Updated 6 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆56Updated 6 months ago
- ☆25Updated 6 months ago
- ☆29Updated 11 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 5 months ago
- ☆47Updated last year
- AbstainQA, ACL 2024☆25Updated 5 months ago
- ☆38Updated last year
- ☆49Updated 7 months ago
- Augmenting Statistical Models with Natural Language Parameters☆23Updated 6 months ago
- ☆34Updated 6 months ago
- ☆21Updated 2 weeks ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆83Updated 8 months ago
- ☆27Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆89Updated 10 months ago
- Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).☆14Updated 3 months ago