jugechengzi / Rationalization-MGR
ACL 2023 *oral* paper "MGR: Multi-generator based Rationalization"
☆10Updated last month
Alternatives and similar repositories for Rationalization-MGR:
Users that are interested in Rationalization-MGR are comparing it to the libraries listed below
- NeurIPS 2022 paper "FR: Folded rationalization with a unified encoder"☆11Updated last month
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆22Updated 2 months ago
- ☆21Updated 8 months ago
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆32Updated 2 months ago
- Accepted by ECCV 2024☆91Updated 3 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆18Updated 6 months ago
- ☆66Updated 2 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆63Updated 6 months ago
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆55Updated 2 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆76Updated last year
- ☆39Updated last year
- Landing Page for TOFU☆107Updated last month
- ☆37Updated 7 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆42Updated 4 months ago
- ☆40Updated 5 months ago
- ☆24Updated 3 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆53Updated 3 months ago
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆19Updated 3 months ago
- A survey on harmful fine-tuning attack for large language model☆124Updated this week
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆51Updated 4 months ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆36Updated 6 months ago
- awesome SAE papers☆13Updated this week
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆43Updated this week
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆16Updated 8 months ago
- ☆33Updated 2 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆88Updated 7 months ago
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆17Updated 3 months ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆61Updated 2 months ago
- Source code for EMNLP2022 paper "Finding Skill Neurons in Pre-trained Transformers via Prompt Tuning".☆18Updated last year