A toolkit to assess data privacy in LLMs (under development)
☆68Jan 2, 2025Updated last year
Alternatives and similar repositories for LLM-PBE
Users that are interested in LLM-PBE are comparing it to the libraries listed below
Sorting:
- End-to-end codebase for finetuning LLMs (LLaMA 2, 3, etc.) with or without DP☆16Sep 23, 2024Updated last year
- ☆43May 23, 2023Updated 2 years ago
- Code for ACL 2024 paper: PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models.☆16Feb 5, 2025Updated last year
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆42Aug 20, 2024Updated last year
- A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)☆52Jul 27, 2025Updated 7 months ago
- ☆16May 16, 2025Updated 9 months ago
- ☆20Feb 3, 2025Updated last year
- OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams (VLDB 2024)☆13Aug 27, 2024Updated last year
- ☆27Oct 6, 2024Updated last year
- [ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer☆46May 30, 2024Updated last year
- Watermarking LLM papers up-to-date☆11Dec 17, 2023Updated 2 years ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆174Apr 23, 2025Updated 10 months ago
- ☆21Jun 22, 2025Updated 8 months ago
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- Source code of "PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration", published in SIGMOD'2021 - By Shixuan Sun, Yuhang Chen…☆17Mar 23, 2021Updated 4 years ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Mar 30, 2025Updated 11 months ago
- ☆35May 21, 2025Updated 9 months ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- Feature partitioner by imbalance or correlation (ICLR 2024)☆17Feb 27, 2026Updated last week
- Learning Safety Constraints for Large Language Models (ICML2025)☆31Aug 4, 2025Updated 7 months ago
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated 11 months ago
- ☆20Jun 16, 2025Updated 8 months ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆104Aug 13, 2024Updated last year
- ☆33Jul 9, 2025Updated 7 months ago
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆17Oct 22, 2024Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆66Sep 30, 2024Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆355Jun 13, 2025Updated 8 months ago
- Private Adaptive Optimization with Side Information (ICML '22)☆16Jun 23, 2022Updated 3 years ago
- Fast Multiple Independent Random Number Sequences Generation on FPGAs☆15Sep 19, 2021Updated 4 years ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 10 months ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Dec 14, 2024Updated last year
- ☆18Feb 25, 2026Updated last week
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Jun 10, 2024Updated last year
- ☆29Feb 27, 2025Updated last year
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆22May 14, 2024Updated last year
- [ACM MM 2023] Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer.☆22Feb 23, 2024Updated 2 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- WAFFLE: Watermarking in Federated Learning☆23Aug 21, 2023Updated 2 years ago