redteaming-arena / redteam-arena
☆31Updated last week
Alternatives and similar repositories for redteam-arena:
Users that are interested in redteam-arena are comparing it to the libraries listed below
- Sphynx Hallucination Induction☆52Updated last month
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆50Updated 3 months ago
- ☆97Updated 4 months ago
- ☆20Updated 4 months ago
- look how they massacred my boy☆63Updated 4 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated 8 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 4 months ago
- Red-Teaming Language Models with DSPy☆170Updated 2 weeks ago
- smolLM with Entropix sampler on pytorch☆150Updated 4 months ago
- ☆38Updated 7 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 11 months ago
- ☆20Updated 4 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆167Updated last month
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆165Updated this week
- ☆122Updated 3 weeks ago
- ☆60Updated last month
- ☆50Updated last year
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆79Updated this week
- ☆48Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated last month
- The next evolution of Agents☆49Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆28Updated 11 months ago