eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆34Updated 4 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆57Updated 9 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆78Updated 3 months ago
- ☆102Updated last year
- The Granite Guardian models are designed to detect risks in prompts and responses.☆122Updated 2 months ago
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- Let Claude control a web browser on your machine.☆39Updated 6 months ago
- ☆65Updated 3 weeks ago
- This repository serves as a comprehensive knowledge hub, curating cutting-edge research papers and developments across 25+ specialized do…☆88Updated last week
- Enhancing AI Software Engineering with Repository-level Code Graph☆233Updated 8 months ago
- Guardrails for secure and robust agent development☆369Updated 4 months ago
- ☆63Updated 5 months ago
- Run SWE-bench evaluations remotely☆44Updated 3 months ago
- ☆18Updated 11 months ago
- Official Repo for CRMArena and CRMArena-Pro☆126Updated last month
- ☆68Updated 11 months ago
- A framework for building large-scale, deterministic, interactive workflows with a fault-tolerant, conversational UX☆43Updated 3 weeks ago
- LLM-based mutation testing☆11Updated 10 months ago
- ☆33Updated 2 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆28Updated last year
- Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs☆23Updated 5 months ago
- ☆11Updated last year
- BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution☆56Updated 2 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72Updated 6 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆85Updated last week
- ☆42Updated 10 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- Codebase exploration with AI research agents☆18Updated 9 months ago
- Code for WALT – Web Agents that Learn Tools☆56Updated last month
- Top papers related to LLM-based agent evaluation☆86Updated last month