llm-attacks / llm-attacksLinks
Universal and Transferable Attacks on Aligned Language Models
☆4,446Updated last year
Alternatives and similar repositories for llm-attacks
Users that are interested in llm-attacks are comparing it to the libraries listed below
Sorting:
- New ways of breaking app-integrated LLMs☆2,036Updated 6 months ago
- ☆679Updated 6 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆835Updated last year
- A curation of awesome tools, documents and projects about LLM Security.☆1,503Updated 5 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,808Updated 7 months ago
- ☆4,112Updated last year
- [CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak…☆3,523Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,712Updated 2 months ago
- ☆1,064Updated last year
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,914Updated last year
- Papers and resources related to the security and privacy of LLMs 🤖☆557Updated 7 months ago
- A unified evaluation framework for large language models☆2,771Updated 3 months ago
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…☆421Updated 11 months ago
- [NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models☆5,792Updated last year
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆803Updated 9 months ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆345Updated 3 months ago
- utilities for decoding deep representations (like sentence embeddings) back to text☆1,047Updated 3 weeks ago
- [NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning☆3,035Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆939Updated last year
- Alpaca dataset from Stanford, cleaned and curated☆1,581Updated 2 years ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,767Updated 2 months ago
- LLM Prompt Injection Detector☆1,396Updated last year
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,580Updated last year
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,933Updated 5 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆563Updated last year
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,460Updated 6 months ago
- A language for constraint-guided and efficient LLM programming.☆4,126Updated 7 months ago
- [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.☆2,688Updated last month
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆514Updated 9 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,407Updated last year