CharlesYu2000 / PCGU-UnlearningBias
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for PCGU-UnlearningBias
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆76Updated 2 months ago
- ☆36Updated last year
- ☆16Updated 4 months ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆11Updated 6 months ago
- ☆111Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- ☆39Updated last year
- ☆35Updated 4 months ago
- Unofficial re-implementation of "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding"☆28Updated last year
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆71Updated 6 months ago
- ☆24Updated 11 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆57Updated 2 weeks ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆62Updated last month
- Landing Page for TOFU☆98Updated 5 months ago
- Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"☆37Updated 2 years ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆71Updated 2 months ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- LLM Unlearning☆125Updated last year
- ☆26Updated 6 months ago
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- ☆23Updated 2 months ago
- [ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models☆60Updated 2 years ago
- Code for watermarking language models☆72Updated 2 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆29Updated last year
- ☆38Updated last year
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆18Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆53Updated last month
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆126Updated last year
- ☆24Updated last year
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆34Updated 4 months ago