invariantlabs-ai / explorerLinks
A better way of testing, inspecting, and analyzing AI Agent traces.
☆35Updated this week
Alternatives and similar repositories for explorer
Users that are interested in explorer are comparing it to the libraries listed below
Sorting:
- Let Claude control a web browser on your machine.☆28Updated 3 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- Agent computer interface for AI software engineer.☆79Updated this week
- Scale your LLM-as-a-judge.☆226Updated last week
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆52Updated 2 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- Code interpreter support for o1☆32Updated 8 months ago
- AGI SDK☆53Updated this week
- Run evals using LLM☆25Updated last year
- ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.☆17Updated 2 months ago
- A framework for hosting and scaling AI agents.☆35Updated 6 months ago
- ☆72Updated 7 months ago
- Using various instructor clients evaluating the quality and capabilities of extractions and reasoning.☆51Updated 8 months ago
- ☆50Updated 6 months ago
- Reactive DDD with DSPy☆22Updated last year
- Guardrails for secure and robust agent development☆256Updated 2 weeks ago
- Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generate…☆42Updated 9 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆78Updated 2 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆60Updated 10 months ago
- ☆40Updated 4 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated 2 months ago
- Red-Teaming Language Models with DSPy☆193Updated 3 months ago
- auto fine tune of models with synthetic data☆74Updated last year
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆77Updated 3 months ago
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆61Updated this week
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆80Updated 8 months ago
- ☆57Updated last week
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆62Updated this week
- Chat Markup Language conversation library☆55Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 3 months ago