Awenbocc / LLM-OODLinks

☆14

Alternatives and similar repositories for LLM-OOD

Users that are interested in LLM-OOD are comparing it to the libraries listed below

Sorting:

deeplearning-wisc / haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
☆60Updated 6 months ago
jamqd / Group-Preference-Optimization
☆21Updated last year
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆177Updated 2 years ago
tmlr-group / NoisyRationales
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆37Updated 3 months ago
cognizant-ai-labs / semantic-density-paper
This repo contains the source code for reproducing the experimental results in semantic density paper (Neurips 2024)
☆16Updated last month
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆122Updated 2 months ago
ydyjya / SafetyHeadAttribution
☆53Updated 5 months ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆59Updated last year
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆217Updated this week
llm-editing / editing-attack
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
☆21Updated last year
git-disl / Safety-Tax
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆25Updated 7 months ago
Improbable-AI / curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆84Updated last year
THU-BPM / Robust_Watermark
Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
☆34Updated 11 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆85Updated 7 months ago
MartinPawelczyk / In-Context-Unlearning
"In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.
☆28Updated 2 years ago
flamewei123 / DEPN
☆25Updated last year
AlexanderVNikitin / kernel-language-entropy
Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)
☆32Updated 10 months ago
Persdre / NeurIPS-2024-LLM-Papers
Accepted LLM Papers in NeurIPS 2024
☆37Updated last year
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆34Updated 9 months ago
usail-hkust / Jailjudge
JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synth…
☆52Updated 10 months ago
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆140Updated 11 months ago
ZhiningLiu1998 / SelfElicit
[ACL'25 Main] SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence! | 让你的LLM更好地利用上下文文档：一个基于注意力的简单方案
☆23Updated 8 months ago
swj0419 / muse_bench
☆28Updated 7 months ago
llm-misinformation / llm-misinformation
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
☆77Updated last year
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆47Updated 11 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆76Updated this week
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆161Updated 6 months ago
eth-sri / llmprivacy
☆69Updated 8 months ago
agiresearch / TrustAgent
TrustAgent: Towards Safe and Trustworthy LLM-based Agents
☆54Updated 9 months ago
franciscoliu / Awesome-GenAI-Unlearning
☆173Updated 3 months ago