eth-sri / ToolFuzz
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆17Updated 2 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆35Updated this week
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆55Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆52Updated 2 months ago
- Guardrails for secure and robust agent development☆252Updated this week
- ☆26Updated 2 months ago
- ☆42Updated 2 weeks ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆45Updated last month
- Let Claude control a web browser on your machine.☆28Updated 2 months ago
- ☆15Updated 4 months ago
- LLM proxy to observe and debug what your AI agents are doing.☆20Updated this week
- Challenges for general-purpose web-browsing AI agents☆53Updated this week
- Visualize any repo or codebase into diagram or animation☆18Updated 7 months ago
- ☆50Updated 5 months ago
- ☆29Updated 3 months ago
- LLM-based mutation testing☆11Updated 3 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆76Updated 5 months ago
- Code for ScribeAgent paper☆57Updated 2 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆11Updated last month
- EvoEval: Evolving Coding Benchmarks via LLM☆70Updated last year
- ☆11Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆59Updated 3 months ago
- ☆108Updated last week
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆66Updated last year
- ☆100Updated 2 months ago
- Test Generation for Prompts☆80Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 5 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆108Updated 6 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆81Updated last month
- ☆38Updated 2 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆55Updated 2 months ago