Awenbocc / LLM-OODLinks
☆12Updated 11 months ago
Alternatives and similar repositories for LLM-OOD
Users that are interested in LLM-OOD are comparing it to the libraries listed below
Sorting:
- Toolkit for evaluating the trustworthiness of generative foundation models.☆106Updated 3 weeks ago
- ☆153Updated 3 months ago
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆48Updated 8 months ago
- ☆92Updated 2 months ago
- ☆31Updated last month
- ☆18Updated last year
- A resource repository for representation engineering in large language models☆127Updated 8 months ago
- ☆24Updated last year
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆19Updated 8 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆142Updated 2 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 5 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆122Updated last year
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆80Updated 3 months ago
- ICL backdoor attack☆13Updated 8 months ago
- LLM Unlearning☆171Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆130Updated 3 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆48Updated 3 months ago
- [ACL'25 Main] SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence! | 让你的LLM更好地利用上下文文档:一个基于注意力的简单方案☆17Updated 5 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆65Updated this week
- ☆39Updated 8 months ago
- Accepted by ECCV 2024☆144Updated 9 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆59Updated last year
- ☆44Updated 5 months ago
- JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synth…☆48Updated 7 months ago
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆60Updated last month
- ☆33Updated 9 months ago
- "In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.☆27Updated last year
- ☆93Updated 5 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆52Updated 10 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆62Updated 6 months ago