Berardinux / OpenVPN_Connect
☆9Updated last year
Related projects ⓘ
Alternatives and complementary repositories for OpenVPN_Connect
- 长按水波纹,拖拽取消发送的语音控件☆25Updated 5 years ago
- Public repository for "Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities".☆13Updated last year
- ✨✨Latest Papers about LLM-based Evaluators☆20Updated 6 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆44Updated last week
- RecyclerView item 中播放视频的Demo☆11Updated 7 years ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆21Updated 4 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆70Updated 2 months ago
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆42Updated 3 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆12Updated 7 months ago
- Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.☆19Updated last month
- ☆153Updated 11 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆81Updated 3 weeks ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆58Updated last month
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆20Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆103Updated last month
- Includes default, storyboard and tests templates.☆10Updated 7 years ago
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆33Updated 4 months ago
- Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"☆40Updated 2 weeks ago
- no kidding,that's most useful baseActivity~☆10Updated 8 years ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆120Updated 6 months ago
- ☆47Updated 4 months ago
- ☆31Updated 5 months ago
- Improving Alignment and Robustness with Circuit Breakers☆152Updated last month
- ☆28Updated last week
- Weak-to-Strong Jailbreaking on Large Language Models☆65Updated 8 months ago
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆45Updated this week
- LLM Unlearning☆123Updated last year
- [NeurIPS 2024] HonestLLM: Toward an Honest and Helpful Large Language Model☆18Updated last month
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 3 months ago