eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆25Updated last month
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆64Updated 3 weeks ago
- Let Claude control a web browser on your machine.☆36Updated 2 months ago
- Visualize any repo or codebase into diagram or animation☆20Updated 10 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆104Updated 3 weeks ago
- Guardrails for secure and robust agent development☆338Updated last month
- AgentFence is an open-source platform for automatically testing AI agent security. It identifies vulnerabilities such as prompt injection…☆20Updated 5 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 5 months ago
- ☆98Updated 11 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆207Updated 5 months ago
- Red-Teaming Language Models with DSPy☆212Updated 6 months ago
- ☆57Updated last month
- ☆30Updated 5 months ago
- Test Generation for Prompts☆131Updated last week
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆22Updated last month
- ☆72Updated 10 months ago
- Examples on how to Integrate Google A2A Protocol and Claude MCP protocol in Java applications☆24Updated 3 months ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆74Updated last year
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 9 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆78Updated last week
- AI-powered computer control for automated testing. Factifai uses vision models (Claude, GPT-4o, Gemini) to interact with applications nat…☆45Updated 2 months ago
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.☆34Updated 9 months ago
- ☆30Updated last year
- Official Repo for CRMArena and CRMArena-Pro☆110Updated 2 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆54Updated 3 months ago
- Run SWE-bench evaluations remotely☆40Updated 2 weeks ago
- Jigsawstack Python SDK☆17Updated last week
- Official implementation of the WASP web agent security benchmark☆44Updated 3 weeks ago
- ☆45Updated last year
- Codebase exploration with AI research agents☆15Updated 6 months ago