uiuc-focal-lab / llm-priming-attacksLinks

☆15

Alternatives and similar repositories for llm-priming-attacks

Users that are interested in llm-priming-attacks are comparing it to the libraries listed below

Sorting:

uiuc-focal-lab / LLMCert-B
A certifier for bias in LLMs
☆23Updated last month
uiuc-arc / FANC
FANC is a tool for the proof transfer of incomplete verification
☆11Updated 3 years ago
SWE-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆178Updated last week
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆108Updated 7 months ago
structuredllm / itergen
Iterate on LLM-based structured generation forward and backward
☆15Updated 2 months ago
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆141Updated 7 months ago
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆69Updated last year
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆105Updated last year
eth-sri / sven
☆114Updated 10 months ago
emrgnt-cmplxty / zero-shot-replication
☆72Updated last year
ScalingIntelligence / codemonkeys
☆41Updated 4 months ago
uiuc-arc / llm-code-watermark
LLM Program Watermarking
☆17Updated last year
google-deepmind / dangerous-capability-evaluations
☆54Updated 8 months ago
ethz-spylab / rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆113Updated 11 months ago
arthurwolf / llmi
Large-Language-Model to Machine Interface project.
☆19Updated last year
r2e-project / r2e
r2e: turn any github repository into a programming agent environment
☆124Updated last month
haizelabs / sphynx
Sphynx Hallucination Induction
☆54Updated 4 months ago
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆313Updated 4 months ago
evo-eval / evoeval
EvoEval: Evolving Coding Benchmarks via LLM
☆72Updated last year
structuredllm / syncode
Efficient and general syntactical decoding for Large Language Models
☆272Updated this week
GraySwanAI / circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
☆208Updated 8 months ago
andyrdt / refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆225Updated 8 months ago
dsbowen / strong_reject
☆64Updated 3 weeks ago
redteaming-arena / redteam-arena
☆33Updated 2 months ago
ethz-spylab / agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆175Updated this week
Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆72Updated 11 months ago
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆51Updated 9 months ago
nuprl / CanItEdit
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
☆42Updated 10 months ago
ozyyshr / RepoGraph
Enhancing AI Software Engineering with Repository-level Code Graph
☆179Updated 2 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆185Updated 6 months ago