☆50Aug 3, 2024Updated last year
Alternatives and similar repositories for redteaming-resistance-benchmark
Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop …☆24Oct 12, 2023Updated 2 years ago
- Fancy upgrade to console.log☆21Jun 1, 2022Updated 3 years ago
- ☆16May 30, 2024Updated last year
- NVIDIA’s repository for enabling trustworthy AI.☆32May 1, 2026Updated last week
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆38Feb 22, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- autoredteam: code for training models that automatically red team other language models☆15Aug 9, 2023Updated 2 years ago
- This repo contains a demo of adversarial strings poisoning vector database and forching specific hallucinations on RAG chatbot.☆10May 2, 2024Updated 2 years ago
- A curated collection of papers and related projects on using LLMs for privacy.☆30Oct 8, 2025Updated 7 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- ☆37May 23, 2023Updated 2 years ago
- Python standalone tokenizer☆15Nov 12, 2015Updated 10 years ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated 2 months ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated last year
- Netflix for XBMC☆61Nov 13, 2012Updated 13 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- [ACL 2024] Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation☆10May 26, 2024Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88May 14, 2024Updated last year
- ☆48May 9, 2024Updated 2 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- ☆27May 20, 2025Updated 11 months ago
- A command line tool for crawling a webstite for dead links, permeant and or fatal redirects, resource load issues, and script errors. It…☆12Apr 16, 2023Updated 3 years ago
- ☆34Sep 19, 2025Updated 7 months ago
- Tree of Attacks (TAP) Jailbreaking Implementation☆120Feb 7, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Run all the tests at the same time with modal.com☆11Mar 2, 2024Updated 2 years ago
- The website of the Public AI Network☆20May 1, 2026Updated last week
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆354Oct 17, 2025Updated 6 months ago
- SecureDNA client and server components monorepo☆19Oct 20, 2025Updated 6 months ago
- ☆57Apr 30, 2026Updated last week
- ☆10Jun 8, 2024Updated last year
- An interactive CLI application for interacting with authenticated Jupyter instances.☆56May 7, 2025Updated last year
- [EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation☆15Aug 20, 2025Updated 8 months ago
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆18Mar 25, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆111Mar 8, 2024Updated 2 years ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆940Aug 16, 2024Updated last year
- [ICLR26] Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 5 months ago
- 🔐 NTLM authentication for Dart/Flutter.☆15Dec 15, 2023Updated 2 years ago
- Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.☆37May 23, 2016Updated 9 years ago
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆35Apr 13, 2026Updated 3 weeks ago
- Privacy backdoors☆50Apr 28, 2024Updated 2 years ago