☆50Aug 3, 2024Updated last year
Alternatives and similar repositories for redteaming-resistance-benchmark
Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16May 30, 2024Updated last year
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆37Feb 22, 2025Updated last year
- A curated collection of papers and related projects on using LLMs for privacy.☆30Oct 8, 2025Updated 6 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Aug 3, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated last month
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 11 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆98Apr 13, 2025Updated last year
- [ACL 2024] Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation☆10May 26, 2024Updated last year
- ☆10Mar 13, 2023Updated 3 years ago
- Automated Safety Testing of Large Language Models☆18Jan 31, 2025Updated last year
- Official Code Release for "Training a Generally Curious Agent"☆46May 18, 2025Updated 11 months ago
- ☆27May 20, 2025Updated 11 months ago
- ☆33Sep 19, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Tree of Attacks (TAP) Jailbreaking Implementation☆119Feb 7, 2024Updated 2 years ago
- Parallel NDJSON Reader for Python☆17Dec 4, 2019Updated 6 years ago
- The website of the Public AI Network☆20Mar 12, 2026Updated last month
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆354Oct 17, 2025Updated 6 months ago
- SecureDNA client and server components monorepo☆17Oct 20, 2025Updated 5 months ago
- Agent Zero plugins index☆49Apr 12, 2026Updated last week
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- ATLAS tactics, techniques, and case studies data☆123Mar 31, 2026Updated 2 weeks ago
- Open Imi is a open source claude desktop alternative for developers, engineers and tech teams to hack MCP's and agents to their own likin…☆11Nov 16, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆31Updated this week
- MindBridge is an AI orchestration MCP server that lets any app talk to any LLM — OpenAI, Anthropic, DeepSeek, Ollama, and more — through …☆31Mar 13, 2026Updated last month
- An interactive CLI application for interacting with authenticated Jupyter instances.☆55May 7, 2025Updated 11 months ago
- ☆10Jun 8, 2024Updated last year
- [EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation☆15Aug 20, 2025Updated 7 months ago
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆18Mar 25, 2025Updated last year
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆924Aug 16, 2024Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆110Mar 8, 2024Updated 2 years ago
- Code for paper [Explaining image classifiers by removing input features using generative models] [ACCV 2020] https://arxiv.org/abs/1910.0…☆15Nov 22, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- PyTorch implementation of Expectation over Transformation☆13Jul 18, 2025Updated 9 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆173Mar 8, 2025Updated last year
- ☆14Mar 23, 2023Updated 3 years ago
- Pretty fast parser for probabilistic context free grammars☆88Apr 17, 2013Updated 13 years ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆118Apr 15, 2024Updated 2 years ago
- Privacy backdoors☆50Apr 28, 2024Updated last year
- A concise PyTorch implementation of Proximal Policy Optimization(PPO) solving CartPole-v0☆16Jun 11, 2020Updated 5 years ago