yxwan123 / BiasAsker
☆15Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for BiasAsker
- ☆31Updated 4 months ago
- ☆35Updated last year
- Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…☆19Updated last year
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 3 months ago
- Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"☆28Updated 3 years ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆70Updated 6 months ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆28Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆69Updated 2 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆52Updated 2 weeks ago
- Multilingual safety benchmark for Large Language Models☆22Updated 2 months ago
- Reinforcement Learning for Repository-Level Code Completion☆12Updated 2 months ago
- ☆11Updated last month
- Repository for the Bias Benchmark for QA dataset.☆84Updated 10 months ago
- ☆109Updated last year
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆61Updated 8 months ago
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆155Updated 6 months ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆78Updated 8 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆43Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated last month
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆27Updated 2 years ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆46Updated 2 months ago
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆137Updated last year
- A resource repository for representation engineering in large language models☆50Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆83Updated 5 months ago
- Source codes for paper ”ReACC: A Retrieval-Augmented Code Completion Framework“☆59Updated 2 years ago
- ☆46Updated 2 years ago
- A list of papers and resources dedicated to code generation☆14Updated 2 years ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆23Updated last year
- Source code of our paper MIND, ACL 2024 Long Paper☆31Updated 5 months ago
- Code for the AAAI 2023 paper "CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models☆25Updated last year