eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆37Updated 6 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆46Updated 3 weeks ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆79Updated 5 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆248Updated 10 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- Let Claude control a web browser on your machine.☆40Updated 8 months ago
- Guardrails for secure and robust agent development☆384Updated 3 weeks ago
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- ☆106Updated last year
- Official implementation of the WASP web agent security benchmark☆67Updated 5 months ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆81Updated last year
- Run SWE-bench evaluations remotely☆51Updated 5 months ago
- ☆68Updated 2 weeks ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆130Updated 4 months ago
- ☆32Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆105Updated last year
- ☆18Updated last year
- This repository serves as a comprehensive knowledge hub, curating cutting-edge research papers and developments across 25+ specialized do…☆92Updated last month
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆184Updated last year
- A platform for building reliable AI agents☆89Updated 3 weeks ago
- Harness used to benchmark aider against SWE Bench benchmarks☆79Updated last year
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆49Updated 2 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆425Updated this week
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆83Updated 6 months ago
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- LLM proxy to observe and debug what your AI agents are doing.☆64Updated 3 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆29Updated last year
- Test Generation for Prompts☆150Updated this week
- ☆42Updated last year
- BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution☆58Updated 3 months ago
- SWE Arena☆35Updated 7 months ago