jamesmurdza / agenteval
Automated testing and benchmarking for code generation agents.
☆18Updated last year
Alternatives and similar repositories for agenteval:
Users that are interested in agenteval are comparing it to the libraries listed below
- Track the progress of LLM context utilisation☆53Updated 8 months ago
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Explore the use of DSPy for extracting features from PDFs 🔎☆39Updated last year
- ☆76Updated 9 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆48Updated 4 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 11 months ago
- ☆24Updated last year
- LLM reads a paper and produce a working prototype☆51Updated last week
- a version of baby agi using dspy and typed predictors☆17Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated 10 months ago
- ☆48Updated last year
- Not financial advice.☆28Updated 2 years ago
- ☆60Updated last year
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Updated last year
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Updated last year
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆33Updated last year
- ☆31Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 6 months ago
- ☆32Updated last year
- ☆37Updated last year
- ☆84Updated last year
- Using modal.com to process FineWeb-edu data☆20Updated 2 weeks ago
- ☆20Updated last year
- ☆57Updated last year