jamesmurdza / agenteval
Automated testing and benchmarking for code generation agents.
☆17Updated last year
Related projects ⓘ
Alternatives and complementary repositories for agenteval
- LLM reads a paper and produce a working prototype☆36Updated last week
- ☆24Updated last year
- Track the progress of LLM context utilisation☆53Updated 4 months ago
- Writing Blog Posts with Generative Feedback Loops!☆43Updated 8 months ago
- ☆54Updated this week
- Using multiple LLMs for ensemble Forecasting☆16Updated 10 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆33Updated 8 months ago
- ☆75Updated 5 months ago
- OpenMindedChatbot is a Proof Of Concept that leverages the power of Open source Large Language Models (LLM) with Function Calling capabil…☆28Updated 11 months ago
- ☆33Updated last year
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Updated 11 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆37Updated 7 months ago
- Simple examples using Argilla tools to build AI☆40Updated this week
- ☆35Updated last year
- ☆41Updated 2 weeks ago
- Evaluating LLMs with CommonGen-Lite☆85Updated 8 months ago
- Conduct consumer interviews with synthetic focus groups using LLMs and LangChain☆43Updated last year
- ☆64Updated 5 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- ☆57Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆20Updated 9 months ago
- A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ☆63Updated last year
- Using modal.com to process FineWeb-edu data☆19Updated 2 months ago
- Official homepage for "Self-Harmonized Chain of Thought"☆83Updated 2 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆61Updated 2 weeks ago
- ☆48Updated last year
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- Chat Markup Language conversation library☆54Updated 10 months ago
- ☆48Updated last year