uiuc-focal-lab / llm-priming-attacks
☆13Updated last year
Alternatives and similar repositories for llm-priming-attacks:
Users that are interested in llm-priming-attacks are comparing it to the libraries listed below
- EvoEval: Evolving Coding Benchmarks via LLM☆66Updated 10 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆41Updated 6 months ago
- ☆74Updated last year
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆125Updated 4 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆132Updated last month
- RepoQA: Evaluating Long-Context Code Understanding☆102Updated 3 months ago
- Large-Language-Model to Machine Interface project.☆17Updated last year
- ☆46Updated 7 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 11 months ago
- FANC is a tool for the proof transfer of incomplete verification☆10Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆184Updated 4 months ago
- r2e: turn any github repository into a programming agent environment☆100Updated 2 weeks ago
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆22Updated 3 weeks ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆206Updated 9 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated 11 months ago
- Efficient and general syntactical decoding for Large Language Models☆232Updated this week
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆29Updated 7 months ago
- ☆83Updated 7 months ago
- Certified Reasoning with Language Models☆31Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆34Updated last week
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆73Updated last year
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆295Updated 3 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆115Updated 8 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆261Updated 3 weeks ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆180Updated 4 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 9 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆109Updated 8 months ago
- ☆153Updated 5 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆49Updated 6 months ago