SALT-NLP / PrivacyLens
A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)
☆25Updated last month
Alternatives and similar repositories for PrivacyLens:
Users that are interested in PrivacyLens are comparing it to the libraries listed below
- Augmenting Statistical Models with Natural Language Parameters☆26Updated 7 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆37Updated 3 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆80Updated 11 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆141Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆53Updated 5 months ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆35Updated 8 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆33Updated last year
- ☆10Updated 2 months ago
- ☆29Updated 11 months ago
- ☆49Updated last year
- ☆44Updated 7 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆80Updated 7 months ago
- Weak-to-Strong Jailbreaking on Large Language Models☆73Updated last year
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- ☆63Updated 3 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆109Updated 7 months ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆97Updated last year
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆60Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆92Updated 3 weeks ago
- ☆155Updated 5 months ago
- ☆27Updated 9 months ago
- AbstainQA, ACL 2024☆25Updated 6 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆36Updated last year
- ☆26Updated last month
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆52Updated 8 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆22Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated last month
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆95Updated 2 months ago
- ☆78Updated 2 years ago