qiuhuachuan / latent-jailbreakLinks
☆40Updated last year
Alternatives and similar repositories for latent-jailbreak
Users that are interested in latent-jailbreak are comparing it to the libraries listed below
Sorting:
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆89Updated last year
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆96Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated 2 years ago
- Feeling confused about super alignment? Here is a reading list☆43Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆93Updated 7 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆70Updated 3 years ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆59Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆109Updated last year
- Do Large Language Models Know What They Don’t Know?☆102Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆44Updated last year
- ☆48Updated last year
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆134Updated last year
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…☆23Updated 2 years ago
- On Transferability of Prompt Tuning for Natural Language Processing☆100Updated last year
- ☆190Updated 2 years ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆50Updated 2 years ago
- ☆143Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆82Updated 2 years ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Updated last year
- Code and data for the FACTOR paper☆52Updated 2 years ago
- [IJCAI 2024] FactCHD: Benchmarking Fact-Conflicting Hallucination Detection☆90Updated last year
- Contrastive Chain-of-Thought Prompting☆68Updated 2 years ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆167Updated 9 months ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆53Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆58Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆64Updated 2 years ago
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆237Updated 2 years ago
- Collection of papers for scalable automated alignment.☆94Updated last year
- ☆28Updated last year