haizelabs / dspy-redteamLinks

Red-Teaming Language Models with DSPy

☆203

Alternatives and similar repositories for dspy-redteam

Users that are interested in dspy-redteam are comparing it to the libraries listed below

Sorting:

haizelabs / get-haized
A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.
☆95Updated 3 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆267Updated 3 weeks ago
haizelabs / bijection-learning
☆24Updated 9 months ago
haizelabs / sphynx
Sphynx Hallucination Induction
☆53Updated 6 months ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆108Updated 3 months ago
andyzorigin / cybench
☆127Updated last month
invariantlabs-ai / invariant
Guardrails for secure and robust agent development
☆327Updated last week
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆198Updated this week
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆87Updated 10 months ago
princeton-pli / hal-harness
☆102Updated this week
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆103Updated last week
cohere-ai / magikarp
Code for the paper "Fishing for Magikarp"
☆162Updated 2 months ago
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆268Updated 9 months ago
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆96Updated this week
lve-org / lve
A repository of Language Model Vulnerabilities and Exposures (LVEs).
☆113Updated last year
invariantlabs-ai / explorer
A better way of testing, inspecting, and analyzing AI Agent traces.
☆39Updated 3 weeks ago
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆111Updated last year
ZenGuard-AI / fast-llm-security-guardrails
The fastest Trust Layer for AI Agents
☆140Updated 2 months ago
hwchase17 / langfuzz
☆71Updated 9 months ago
javirandor / anthropic-tokenizer
Approximation of the Claude 3 tokenizer by inspecting generation stream
☆131Updated last year
redteaming-arena / redteam-arena
☆34Updated last month
ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆289Updated last week
hwchase17 / adversarial-prompts
Curation of prompts that are known to be adversarial to large language models
☆184Updated 2 years ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
MinorJerry / OpenWebVoyager
☆78Updated 9 months ago
emergent-misalignment / emergent-misalignment
☆182Updated 4 months ago
rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆113Updated last week