yxwan123 / BiasAsker
☆15Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for BiasAsker
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆71Updated 2 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆71Updated 6 months ago
- ☆31Updated 5 months ago
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆62Updated 8 months ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆79Updated 8 months ago
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆24Updated 3 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆52Updated last month
- Multilingual safety benchmark for Large Language Models☆24Updated 2 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆57Updated 4 months ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆28Updated 2 months ago
- Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"☆29Updated 3 years ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆39Updated last month
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"☆48Updated 8 months ago
- ☆23Updated 2 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆23Updated last year
- ☆111Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆99Updated 4 months ago
- Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…☆20Updated last year
- Code for the AAAI 2023 paper "CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models☆25Updated last year
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆34Updated last year
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆46Updated 3 months ago
- ☆33Updated last year
- ☆36Updated last year
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆45Updated 2 weeks ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆13Updated 7 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆43Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆65Updated 2 years ago
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated last month
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆60Updated last month