eth-sri / ToolFuzz
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.
☆16Updated last month
Alternatives and similar repositories for ToolFuzz:
Users that are interested in ToolFuzz are comparing it to the libraries listed below
- A better way of testing, inspecting, and analyzing AI Agent traces.☆35Updated this week
- Let Claude control a web browser on your machine.☆26Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆51Updated last month
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆54Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆78Updated last month
- Guardrails for secure and robust agent development☆237Updated last week
- A structured framework for defining, verifying and certifying AI systems.☆11Updated last month
- ☆38Updated 2 weeks ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆155Updated 3 weeks ago
- ☆72Updated 6 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆70Updated 9 months ago
- LLM proxy to observe and debug what your AI agents are doing.☆15Updated this week
- ☆75Updated 5 months ago
- ☆25Updated last month
- A framework for hosting and scaling AI agents.☆33Updated 5 months ago
- SWE Arena☆33Updated last week
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆25Updated last week
- An open source MCP proxy.☆8Updated 3 months ago
- ☆15Updated last year
- ☆38Updated last month
- Visualize any repo or codebase into diagram or animation☆17Updated 6 months ago
- Test suite for validating MCP server implementations against the open MCP protocol specification. Helps developers ensure protocol compli…☆34Updated 2 weeks ago
- Red-Teaming Language Models with DSPy☆183Updated 2 months ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆68Updated 7 months ago
- ☆92Updated 7 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆76Updated 2 months ago
- Agent computer interface for AI software engineer.☆63Updated this week
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆78Updated last month
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 5 months ago