jamesmurdza / agenteval
Automated testing and benchmarking for code generation agents.
☆18Updated last year
Alternatives and similar repositories for agenteval:
Users that are interested in agenteval are comparing it to the libraries listed below
- Conduct consumer interviews with synthetic focus groups using LLMs and LangChain☆43Updated last year
- ☆57Updated last year
- ☆76Updated 7 months ago
- ☆48Updated last year
- Track the progress of LLM context utilisation☆53Updated 6 months ago
- ☆24Updated last year
- An OpenAI Completions API compatible server for NLP transformers models☆60Updated last year
- A framework for evaluating function calls made by LLMs☆36Updated 5 months ago
- ☆46Updated 2 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆37Updated 10 months ago
- ☆75Updated 11 months ago
- Writing Blog Posts with Generative Feedback Loops!☆46Updated 9 months ago
- ☆20Updated 11 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- ☆65Updated 7 months ago
- Simple examples using Argilla tools to build AI☆52Updated 2 months ago
- ☆37Updated last year
- Official homepage for "Self-Harmonized Chain of Thought"☆88Updated last month
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated last year
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆31Updated 11 months ago
- LLM reads a paper and produce a working prototype☆46Updated 2 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 6 months ago
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Updated last year
- Python Server for C3 AI app. A project that brings the power of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) with…☆22Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated last month
- Using modal.com to process FineWeb-edu data☆19Updated last month
- A seamless matchmaking application that is programmed with Cohere Command R+, Stanford NLP DSPy framework, Weaviate Vector store and Crew…☆59Updated 8 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆58Updated 6 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 3 months ago