Jometeorie / KnowledgeSpread
☆16Updated 2 months ago
Related projects: ⓘ
- ☆27Updated 3 months ago
- ☆28Updated 7 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆81Updated this week
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆57Updated 6 months ago
- The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆79Updated 3 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents☆57Updated last month
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆55Updated 2 months ago
- LLM Unlearning☆112Updated 11 months ago
- ☆24Updated 3 months ago
- ☆76Updated 4 months ago
- Implementation of the MATRIX framework (ICML 2024)☆36Updated 4 months ago
- ☆20Updated 2 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆64Updated 2 weeks ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆81Updated 5 months ago
- The awesome agents in the era of large language models☆48Updated 10 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆26Updated 3 weeks ago
- S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models☆31Updated 2 months ago
- [FCS'24] LVLM Safety paper☆11Updated 5 months ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Updated 2 months ago
- Achieving Efficient Alignment through Learned Correction☆103Updated 3 months ago
- ☆76Updated last month
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆34Updated 4 months ago
- A repository of useful research/skill-upgrading talks or acticles in NLP/CV/AI Area (in Chinese).☆65Updated last month
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆89Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- Multilingual safety benchmark for Large Language Models☆21Updated 2 weeks ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models☆50Updated 2 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆48Updated 3 weeks ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆61Updated 4 months ago