☆16Apr 7, 2025Updated 10 months ago
Alternatives and similar repositories for DOOR-Alignment
Users that are interested in DOOR-Alignment are comparing it to the libraries listed below
Sorting:
- ☆20Nov 15, 2024Updated last year
- [ACL2025 Best Paper] Language Models Resist Alignment☆43Jun 11, 2025Updated 8 months ago
- ☆29May 22, 2025Updated 9 months ago
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆58Oct 1, 2025Updated 5 months ago
- Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models☆27Mar 15, 2025Updated 11 months ago
- ☆26Feb 14, 2024Updated 2 years ago
- ☆121Feb 3, 2025Updated last year
- ☆35Feb 20, 2025Updated last year
- Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".☆32Mar 10, 2025Updated 11 months ago
- [CVPR 2023] Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection☆30Jun 21, 2023Updated 2 years ago
- ☆35May 21, 2025Updated 9 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- ☆13Oct 5, 2025Updated 4 months ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor☆29Jan 13, 2026Updated last month
- ☆43Feb 9, 2026Updated 3 weeks ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆174Apr 23, 2025Updated 10 months ago
- Visualizing the learned space-time attention using Attention Rollout☆40Apr 1, 2022Updated 3 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- ☆14May 1, 2023Updated 2 years ago
- 2020湖南省第一届人工智能大赛参赛作品☆11Feb 17, 2022Updated 4 years ago
- yolo目标检测算法☆15Jul 27, 2025Updated 7 months ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- Instituto de Telecomunicações Deep Learning-based Point Cloud Codec☆11Jun 18, 2024Updated last year
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆14Nov 25, 2024Updated last year
- An implementation of MSSRM method☆11Mar 23, 2023Updated 2 years ago
- ☆16Jan 16, 2025Updated last year
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- my profile readme☆14Updated this week
- [CVPR 2025] Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation☆19Dec 18, 2025Updated 2 months ago
- Code for the AAAI 2024 paper: "AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack" (accepted).☆12Mar 28, 2024Updated last year
- A toolkit for testing and improving named entity recognition [ESEC/FSE'23]☆11Aug 31, 2023Updated 2 years ago
- official implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation☆13Apr 15, 2024Updated last year