eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆17Updated 2 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆37Updated last week
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆56Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆52Updated 2 months ago
- Let Claude control a web browser on your machine.☆29Updated this week
- ☆50Updated last week
- ☆26Updated 3 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆73Updated 7 months ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆48Updated 2 months ago
- AgentFence is an open-source platform for automatically testing AI agent security. It identifies vulnerabilities such as prompt injection…☆12Updated 3 months ago
- A framework for hosting and scaling AI agents.☆35Updated 6 months ago
- ☆9Updated last week
- Official implementation of the WASP web agent security benchmark☆23Updated 3 weeks ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆91Updated last month
- Code interpreter support for o1☆32Updated 8 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆69Updated last year
- ☆72Updated 7 months ago
- Test Generation for Prompts☆95Updated this week
- Codebase exploration with AI research agents☆13Updated 3 months ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆49Updated last month
- A prompt defence is a multi-layer defence that can be used to protect your applications against prompt injection attacks.☆16Updated 7 months ago
- MCP server to manage letta server and comunicate with agents☆20Updated this week
- Visualize any repo or codebase into diagram or animation☆18Updated 7 months ago
- Unofficial Claude Code SDKs for Typescript and Python☆15Updated 2 weeks ago
- ☆16Updated 5 months ago
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.☆27Updated 6 months ago
- ☆15Updated last year
- The Granite Guardian models are designed to detect risks in prompts and responses.☆85Updated 2 months ago