invariantlabs-ai / invariant
A framework-less approach to robust agent development.
☆154Updated this week
Alternatives and similar repositories for invariant:
Users that are interested in invariant are comparing it to the libraries listed below
- Red-Teaming Language Models with DSPy☆168Updated last week
- Sphynx Hallucination Induction☆52Updated 3 weeks ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆146Updated 2 weeks ago
- ☆351Updated 2 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆28Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆78Updated this week
- Enhancing AI Software Engineering with Repository-level Code Graph☆133Updated last month
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 11 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆208Updated this week
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated 8 months ago
- Sandboxed code execution for AI agents, locally or on the cloud.☆89Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆161Updated last week
- AWM: Agent Workflow Memory☆242Updated 3 weeks ago
- ☆83Updated 7 months ago
- Commit0: Library Generation from Scratch☆125Updated 3 weeks ago
- r2e: turn any github repository into a programming agent environment☆100Updated 3 weeks ago
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆92Updated 3 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆208Updated 9 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆108Updated 11 months ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆57Updated 5 months ago
- EvoEval: Evolving Coding Benchmarks via LLM☆66Updated 10 months ago
- Code and Data for Tau-Bench☆273Updated 3 weeks ago
- Prototype advanced LLM algorithms for reasoning and planning.☆96Updated 6 months ago
- End-to-end Generative Optimization for AI Agents☆479Updated this week
- Tutorial for building LLM router☆182Updated 7 months ago
- ☆77Updated 2 months ago
- 🤖🌊 aiFlows: The building blocks of your collaborative AI☆245Updated 9 months ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆35Updated this week