qiuhuachuan / latent-jailbreakLinks

☆40

Alternatives and similar repositories for latent-jailbreak

Users that are interested in latent-jailbreak are comparing it to the libraries listed below

Sorting:

hongbinye / Cognitive-Mirage-Hallucinations-in-LLMs
Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"
☆47Updated 2 years ago
Re-Align / just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
☆88Updated last year
shuyhere / about-super-alignment
Feeling confused about super alignment? Here is a reading list
☆43Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
DAMO-NLP-SG / multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆93Updated last year
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆52Updated last year
penguinnnnn / awesome-llm-and-society
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
☆50Updated 2 years ago
Lordog / R-Judge
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆91Updated 6 months ago
hyintell / awesome-refreshing-llms
EMNLP'23 survey: a curation of awesome papers and resources on refreshing large language models (LLMs) without expensive retraining.
☆136Updated last year
UKPLab / acl2025-diverse-cot
Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"
☆33Updated 4 months ago
thu-coai / Targeted-Data-Extraction
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…
☆23Updated 2 years ago
declare-lab / red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆107Updated last year
RUCAIBox / LLM-Knowledge-Boundary
Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"
☆82Updated 2 years ago
halfrot / ALaRM
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
☆25Updated last year
yinzhangyue / SelfAware
Do Large Language Models Know What They Don’t Know?
☆101Updated last year
Spico197 / awesome-lm-evaluation
🩺 A collection of ChatGPT evaluation reports on various bechmarks.
☆50Updated 2 years ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 3 years ago
Princeton-SysML / kNNLM_privacy
Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888
☆37Updated last year
Hunter-DDM / knowledge-neurons
Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"
☆173Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆133Updated last year
CUHK-ARISE / PsychoBench
Benchmarking LLMs' Psychological Portrayal
☆126Updated 10 months ago
ntunlp / LLMSanitize
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆57Updated last year
swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆233Updated 2 years ago
zijian678 / TDD
☆12Updated last year
RUCAIBox / HaluEval-2.0
☆47Updated last year
PlusLabNLP / Active-IT
Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"
☆25Updated last year
c-box / KnowledgeLifecycle
Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"
☆59Updated 2 years ago
orhonovich / instruction-induction
☆67Updated 3 years ago
AI21Labs / factor
Code and data for the FACTOR paper
☆52Updated 2 years ago
vr25 / hallucination-foundation-model-survey
A Survey of Hallucination in Large Foundation Models
☆55Updated last year