uiuc-focal-lab / llm-priming-attacksLinks
☆15Updated last year
Alternatives and similar repositories for llm-priming-attacks
Users that are interested in llm-priming-attacks are comparing it to the libraries listed below
Sorting:
- A certifier for bias in LLMs☆23Updated last month
- FANC is a tool for the proof transfer of incomplete verification☆11Updated 3 years ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆178Updated last week
- RepoQA: Evaluating Long-Context Code Understanding☆108Updated 7 months ago
- Iterate on LLM-based structured generation forward and backward☆15Updated 2 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆141Updated 7 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆69Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆105Updated last year
- ☆114Updated 10 months ago
- ☆72Updated last year
- ☆41Updated 4 months ago
- LLM Program Watermarking☆17Updated last year
- ☆54Updated 8 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆113Updated 11 months ago
- Large-Language-Model to Machine Interface project.☆19Updated last year
- r2e: turn any github repository into a programming agent environment☆124Updated last month
- Sphynx Hallucination Induction☆54Updated 4 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆313Updated 4 months ago
- EvoEval: Evolving Coding Benchmarks via LLM☆72Updated last year
- Efficient and general syntactical decoding for Large Language Models☆272Updated this week
- Improving Alignment and Robustness with Circuit Breakers☆208Updated 8 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆225Updated 8 months ago
- ☆64Updated 3 weeks ago
- ☆33Updated 2 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆175Updated this week
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆51Updated 9 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆42Updated 10 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆179Updated 2 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆185Updated 6 months ago