eth-sri / ToolFuzzLinks
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆37Updated 6 months ago
Alternatives and similar repositories for ToolFuzz
Users that are interested in ToolFuzz are comparing it to the libraries listed below
Sorting:
- A better way of testing, inspecting, and analyzing AI Agent traces.☆46Updated 2 weeks ago
- Guardrails for secure and robust agent development☆383Updated 2 weeks ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆79Updated 4 months ago
- ☆67Updated this week
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- Run SWE-bench evaluations remotely☆50Updated 5 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆248Updated 9 months ago
- ☆33Updated 4 months ago
- A framework for building large-scale, deterministic, interactive workflows with a fault-tolerant, conversational UX☆44Updated this week
- The Granite Guardian models are designed to detect risks in prompts and responses.☆128Updated 3 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- ☆106Updated last year
- This repository serves as a comprehensive knowledge hub, curating cutting-edge research papers and developments across 25+ specialized do…☆92Updated 3 weeks ago
- BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution☆57Updated 3 months ago
- Let Claude control a web browser on your machine.☆40Updated 7 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆104Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆26Updated 11 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆29Updated last year
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Updated last month
- ☆50Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆97Updated 3 months ago
- Sphynx Hallucination Induction☆52Updated 11 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆71Updated 8 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 9 months ago
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆26Updated 4 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Updated 9 months ago
- ☆61Updated 7 months ago
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- A prompt defence is a multi-layer defence that can be used to protect your applications against prompt injection attacks.☆21Updated last month