eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆25Updated 3 weeks ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆62Updated 5 months ago
- Visualize any repo or codebase into diagram or animation☆20Updated 9 months ago
- Guardrails for secure and robust agent development☆329Updated 2 weeks ago
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆21Updated last week
- ☆56Updated 3 weeks ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆198Updated 4 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 5 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 4 months ago
- Let Claude control a web browser on your machine.☆36Updated 2 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 8 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆93Updated last week
- LLM-based mutation testing☆11Updated 6 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆70Updated 2 weeks ago
- Official Repo for CRMArena and CRMArena-Pro☆104Updated last month
- ☆96Updated 11 months ago
- A Model Context Protocol server for Python code analysis with Claude. Again, works with warning now. I'm missing something here.☆13Updated 7 months ago
- ☆11Updated 9 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆53Updated 3 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆95Updated 3 months ago
- Test Generation for Prompts☆116Updated this week
- AI-powered computer control for automated testing. Factifai uses vision models (Claude, GPT-4o, Gemini) to interact with applications nat…☆43Updated last month
- Harness used to benchmark aider against SWE Bench benchmarks☆71Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆59Updated 8 months ago
- Examples on how to Integrate Google A2A Protocol and Claude MCP protocol in Java applications☆23Updated 2 months ago
- AgentFence is an open-source platform for automatically testing AI agent security. It identifies vulnerabilities such as prompt injection…☆17Updated 5 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Updated 4 months ago
- [NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications☆117Updated last month
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆73Updated 11 months ago
- Codebase exploration with AI research agents☆15Updated 5 months ago