☆50Aug 3, 2024Updated last year
Alternatives and similar repositories for redteaming-resistance-benchmark
Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16May 30, 2024Updated last year
- NVIDIA’s repository for enabling trustworthy AI.☆32Apr 7, 2026Updated last week
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆37Feb 22, 2025Updated last year
- autoredteam: code for training models that automatically red team other language models☆14Aug 9, 2023Updated 2 years ago
- This repo contains a demo of adversarial strings poisoning vector database and forching specific hallucinations on RAG chatbot.☆10May 2, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A curated collection of papers and related projects on using LLMs for privacy.☆30Oct 8, 2025Updated 6 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Open Source Auth Built on Freestyle: own your auth + data https://docs.freestyle.dev/guides/authentication/☆23Jun 12, 2024Updated last year
- Python standalone tokenizer☆15Nov 12, 2015Updated 10 years ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated last month
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 11 months ago
- ☆14Dec 19, 2024Updated last year
- EmojiNotion – project for CBMI 2025 conference☆14Jun 6, 2025Updated 10 months ago
- ☆10Mar 13, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Automated Safety Testing of Large Language Models☆18Jan 31, 2025Updated last year
- ☆48May 9, 2024Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88May 14, 2024Updated last year
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated last year
- Official Code Release for "Training a Generally Curious Agent"☆46May 18, 2025Updated 11 months ago
- ☆15Jun 7, 2024Updated last year
- ☆27May 20, 2025Updated 11 months ago
- A command line tool for crawling a webstite for dead links, permeant and or fatal redirects, resource load issues, and script errors. It…☆12Apr 16, 2023Updated 3 years ago
- ☆34Sep 19, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Tree of Attacks (TAP) Jailbreaking Implementation☆119Feb 7, 2024Updated 2 years ago
- Run all the tests at the same time with modal.com☆11Mar 2, 2024Updated 2 years ago
- Parallel NDJSON Reader for Python☆17Dec 4, 2019Updated 6 years ago
- Agent Zero (agent-zero.ai) extensions for ethical penetration testing☆20Sep 10, 2025Updated 7 months ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆354Oct 17, 2025Updated 6 months ago
- MCP easy installer is a robust mcp server with tools to search, install, configure, repair and uninstall MCP servers☆17Apr 19, 2025Updated last year
- A crowd-sourced public tracker of bias audits of automated employment decision tools (AEDTs) released by employers related to NYC's Local…☆18Nov 5, 2024Updated last year
- SecureDNA client and server components monorepo☆17Oct 20, 2025Updated 6 months ago
- Agent Zero plugins index☆49Apr 12, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ATLAS tactics, techniques, and case studies data☆124Mar 31, 2026Updated 2 weeks ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆924Aug 16, 2024Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆110Mar 8, 2024Updated 2 years ago
- Code for paper [Explaining image classifiers by removing input features using generative models] [ACCV 2020] https://arxiv.org/abs/1910.0…☆15Nov 22, 2022Updated 3 years ago
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 4 months ago
- 🔐 NTLM authentication for Dart/Flutter.☆15Dec 15, 2023Updated 2 years ago