eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆30Updated 3 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated this week
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆69Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆119Updated 2 weeks ago
- Guardrails for secure and robust agent development☆354Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆57Updated 7 months ago
- Let Claude control a web browser on your machine.☆39Updated 4 months ago
- Official implementation of the WASP web agent security benchmark☆51Updated 2 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆217Updated 6 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆92Updated 10 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆166Updated last year
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆69Updated 5 months ago
- Red-Teaming Language Models with DSPy☆221Updated 8 months ago
- ☆101Updated last year
- Run SWE-bench evaluations remotely☆41Updated 2 months ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆172Updated 6 months ago
- Agent computer interface for AI software engineer.☆111Updated last month
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆52Updated 3 months ago
- Official Repo for CRMArena and CRMArena-Pro☆119Updated 4 months ago
- AgentFence is an open-source platform for automatically testing AI agent security. It identifies vulnerabilities such as prompt injection…☆27Updated 7 months ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆72Updated 3 months ago
- ☆167Updated last week
- ☆34Updated 7 months ago
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆24Updated last month
- ☆26Updated last year
- ☆48Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆87Updated 2 weeks ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆75Updated last year
- ☆117Updated 4 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated 3 months ago