cambridgeltl / ClaPSLinks

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning (Zhou et al.; EMNLP 2023 Findings)

☆17

Alternatives and similar repositories for ClaPS

Users that are interested in ClaPS are comparing it to the libraries listed below

Sorting:

psunlpgroup / ReaLMistake
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆30Updated last year
yfqiu-nlp / sea-llm
Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"
☆28Updated 10 months ago
GXimingLu / IPA
Codebase for Inference-Time Policy Adapters
☆24Updated 2 years ago
technion-cs-nlp / LLMsKnow
☆80Updated 9 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 3 years ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
HazyResearch / skill-it
Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models
☆47Updated 2 years ago
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆138Updated 4 months ago
XuandongZhao / weak-to-strong
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆88Updated 6 months ago
dunzeng / MORE
Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
☆16Updated last year
roeehendel / icl_task_vectors
☆98Updated 2 years ago
JacobPfau / fillerTokens
☆75Updated last year
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆69Updated last year
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆84Updated last year
arazd / ProgressivePrompts
Progressive Prompts: Continual Learning for Language Models
☆97Updated 2 years ago
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆118Updated last year
feradauto / MoralCoT
Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
☆38Updated 2 years ago
thunlp / Modularity-Analysis
Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"
☆25Updated 2 years ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99Updated last year
architsharma97 / dpo-rlaif
☆100Updated last year
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆99Updated 4 years ago
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆80Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated last month
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆94Updated 2 years ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆82Updated last year
ChenmienTan / malmen
☆35Updated last year
lil-lab / icrl
☆29Updated 8 months ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆98Updated 2 years ago