invariantlabs-ai / invariant
Helps you build better AI agents through debuggable unit testing
☆141Updated this week
Alternatives and similar repositories for invariant:
Users that are interested in invariant are comparing it to the libraries listed below
- Red-Teaming Language Models with DSPy☆153Updated 9 months ago
- Sphynx Hallucination Induction☆51Updated 5 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆133Updated this week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆22Updated this week
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆182Updated this week
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆89Updated 2 months ago
- ☆319Updated this week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆202Updated this week
- Synthetic Data for LLM Fine-Tuning☆107Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆88Updated 7 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆122Updated last week
- Automatic Evals for Instruction-Tuned Models☆100Updated this week
- r2e: turn any github repository into a programming agent environment☆94Updated 2 weeks ago
- Python SDK for running evaluations on LLM generated responses☆253Updated last week
- AWM: Agent Workflow Memory☆231Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆163Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆92Updated 10 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆125Updated 9 months ago
- Commit0: Library Generation from Scratch☆122Updated 3 weeks ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆189Updated this week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆114Updated 7 months ago
- Fiddler Auditor is a tool to evaluate language models.☆174Updated 10 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆108Updated 10 months ago
- ☆82Updated 6 months ago
- Tutorial for building LLM router☆170Updated 5 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆68Updated 3 months ago
- ☆115Updated this week
- Code and Data for Tau-Bench☆253Updated last week
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆459Updated 9 months ago