invariantlabs-ai / invariant
A framework-less approach to robust agent development.
☆156Updated this week
Alternatives and similar repositories for invariant:
Users that are interested in invariant are comparing it to the libraries listed below
- Red-Teaming Language Models with DSPy☆175Updated last month
- A better way of testing, inspecting, and analyzing AI Agent traces.☆30Updated this week
- Scaling inference-time compute for LLM-as-a-judge, automated evaluations, guardrails, and reinforcement learning.☆189Updated last week
- Sphynx Hallucination Induction☆53Updated last month
- ☆367Updated last month
- Enhancing AI Software Engineering with Repository-level Code Graph☆146Updated 2 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated 9 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆96Updated last year
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆112Updated this week
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆155Updated 2 weeks ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆129Updated last week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆83Updated this week
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆98Updated 4 months ago
- r2e: turn any github repository into a programming agent environment☆105Updated 3 weeks ago
- Prompt engineering, automated.☆288Updated 4 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆108Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆210Updated 10 months ago
- ☆71Updated 5 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆75Updated 5 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 7 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆50Updated 2 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆165Updated 2 weeks ago
- Collection of evals for Inspect AI☆97Updated this week
- Python SDK for running evaluations on LLM generated responses☆272Updated last week
- ☆106Updated this week
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆131Updated last year
- Commit0: Library Generation from Scratch☆139Updated 2 weeks ago
- Let Claude control a web browser on your machine.☆17Updated 3 weeks ago
- ☆86Updated 3 weeks ago
- ☆87Updated 8 months ago