invariantlabs-ai / explorer
A better way of testing, inspecting, and analyzing AI Agent traces.
☆29Updated this week
Alternatives and similar repositories for explorer:
Users that are interested in explorer are comparing it to the libraries listed below
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.☆122Updated 4 months ago
- Code interpreter support for o1☆32Updated 6 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆72Updated last week
- Prompt design in Python☆55Updated 3 months ago
- A framework-less approach to robust agent development.☆156Updated this week
- Reactive DDD with DSPy☆22Updated last year
- Sphynx Hallucination Induction☆52Updated last month
- Structured outputs from DSPy and Jinja2☆23Updated 2 months ago
- Using modal.com to process FineWeb-edu data☆20Updated last week
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆59Updated 7 months ago
- Coding problems used in aider's polyglot benchmark☆65Updated 2 months ago
- Enables cloud-based AI services to access local Stdio based MCP servers☆54Updated 2 months ago
- auto fine tune of models with synthetic data☆74Updated last year
- QLLM: A powerful CLI for seamless interaction with multiple Large Language Models. Simplify AI workflows, streamline development, and unl…☆33Updated 2 weeks ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆73Updated last year
- ☆30Updated last year
- Simple examples using Argilla tools to build AI☆53Updated 3 months ago
- freeact is a lightweight library for code-action based agents☆71Updated 2 weeks ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆75Updated 3 weeks ago
- Annoucing Instructor Cloud☆34Updated 6 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆49Updated this week
- Small, simple agent task environments for training and evaluation☆18Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆61Updated 5 months ago
- Dynamic Metadata based RAG Framework☆72Updated 7 months ago
- A new benchmark for measuring LLM's capability to detect bugs in large codebase.☆29Updated 9 months ago
- Embed anything.☆29Updated 9 months ago