mickelliu/selfplay-redteaming

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mickelliu/selfplay-redteaming)

mickelliu / selfplay-redteaming

☆36

Alternatives and similar repositories for selfplay-redteaming

Users that are interested in selfplay-redteaming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xsddys / TRACE
View on GitHub
TRACE, a framework for turn-aware credit assignment for multi-turn jailbreak optimization
☆19Jun 22, 2026Updated 3 weeks ago
ariahw / rl-rewardhacking
View on GitHub
☆44Feb 18, 2026Updated 5 months ago
koayon / phil-interp-papers
View on GitHub
A curated reading list for researchers in the Philosophy of Interpretability
☆17Aug 17, 2025Updated 11 months ago
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
scaleapi / mrt
View on GitHub
https://scale.com/research/mrt
☆20Mar 16, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
haeggee / hot-mess-of-ai
View on GitHub
☆35Feb 2, 2026Updated 5 months ago
nicoladainese96 / code-world-models
View on GitHub
Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.
☆20Feb 21, 2025Updated last year
THUIR / THUIR-website
View on GitHub
THUIR website
☆10Feb 23, 2026Updated 4 months ago
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
AI45Lab / MAGIC
View on GitHub
Code for paper "MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM safety"
☆51May 11, 2026Updated 2 months ago
saferlhf-v / saferlhf-v
View on GitHub
☆23Jun 16, 2025Updated last year
RPC2 / AutoInject
View on GitHub
☆20Jun 12, 2026Updated last month
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
open-rsm / vr
View on GitHub
A library of replicated state machine algorithms is based on Viewstamped Replication Revisited
☆13Feb 6, 2021Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
poloclub / llm-landscape
View on GitHub
NeurIPS'24 - LLM Safety Landscape
☆40Oct 21, 2025Updated 9 months ago
HumanCompatibleAI / interpreting-rewards
View on GitHub
Experiments in applying interpretability techniques to learned reward functions.
☆10Dec 11, 2020Updated 5 years ago
UCSB-NLP-Chang / causal_unlearn
View on GitHub
[EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"
☆35Jul 22, 2024Updated last year
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
uw-syfi / vibesys
View on GitHub
Can AI Agents Build Bespoke Systems?
☆84Updated this week
TeunvdWeij / sandbagging
View on GitHub
☆20Nov 15, 2024Updated last year
ZhangShiyue / extractive_is_not_faithful
View on GitHub
☆17May 19, 2023Updated 3 years ago
Red-Hat-AI-Innovation-Team / its_hub
View on GitHub
A Python library for inference-time scaling LLMs
☆36Updated this week
zou-group / humanlm
View on GitHub
HumanLM: Simulating Users with State Alignment Beats Response Imitation
☆84Jun 4, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
BIU-NLP / iFACETSUM
View on GitHub
Corpus exploration platform using advanced tools such as interactive summarization and multi document coreference resolution
☆12Jun 15, 2023Updated 3 years ago
amuta / DDPG-MountainCarContinuous-v0
View on GitHub
Solving the OpenAI Gym (MountainCarContinuous-v0) with DDPG
☆21Jan 23, 2023Updated 3 years ago
keing1 / reward-hack-generalization
View on GitHub
Datasets used in the paper "Reward hacking behavior can generalize across tasks"
☆15Aug 17, 2025Updated 11 months ago
safety-research / A3
View on GitHub
☆19Dec 29, 2025Updated 6 months ago
RUCAIBox / FIGA
View on GitHub
[ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"
☆10May 5, 2024Updated 2 years ago
abdulhaim / LMRL-Gym
View on GitHub
☆116Jul 2, 2024Updated 2 years ago
Lyz1213 / BadEdit
View on GitHub
☆38Oct 17, 2024Updated last year
SjJ1017 / CiteLab
View on GitHub
The predecessor of CiteLab.
☆18Feb 3, 2026Updated 5 months ago
safety-research / false-facts
View on GitHub
☆50Jul 4, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tdietert / lambda-pi
View on GitHub
A toy implementation of the dependently typed lambda calculus known as λΠ
☆12Jan 29, 2020Updated 6 years ago
andrewchambers / hafs
View on GitHub
A high availability distributed filesystem built on FoundationDB and fuse.
☆22Mar 13, 2023Updated 3 years ago
qiaoguanren / Multi-Modal-Inverse-Constrained-Reinforcement-Learning
View on GitHub
NeurIPS[2023] "Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations" official implement
☆13Feb 19, 2024Updated 2 years ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
Lysxia / quickcheck-higherorder
View on GitHub
QuickCheck extension for higher-order properties
☆19Feb 14, 2022Updated 4 years ago
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
Stanford-ILIAD / ILEED
View on GitHub
Companion code for ICML 2022 paper "Imitation Learning by Estimating Expertise of Demonstrators"
☆11Jul 5, 2023Updated 3 years ago