philschmid / evaluate-llms
Includes examples on how to evaluate LLMs
☆23Updated 6 months ago
Alternatives and similar repositories for evaluate-llms:
Users that are interested in evaluate-llms are comparing it to the libraries listed below
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- ☆29Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆103Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆108Updated 7 months ago
- ☆77Updated 11 months ago
- ☆19Updated 6 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆47Updated last year
- Sample notebooks and prompts for LLM evaluation☆124Updated this week
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- ☆24Updated last year
- This repository contains the source code for running llamaindex tutorials from https://howaibuildthis.substack.com/☆41Updated last year
- Simple examples using Argilla tools to build AI☆52Updated 5 months ago
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- A reasoning assistant for your STEM education☆19Updated last month
- ☆15Updated last year
- ☆74Updated 3 months ago
- Test LLMs automatically with Giskard and CI/CD☆30Updated 9 months ago
- A semantic research engine to get relevant papers based on a user query. Application frontend with Chainlit Copilot. Observability with L…☆82Updated last year
- GenAI Experimentation☆58Updated 2 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- LlamaWorksDB is a Retrieval Augmented Generation (RAG) product designed to interact with the documentation of various products such as Ll…☆16Updated last year
- ☆143Updated 9 months ago
- Simple AI agents / assistants☆45Updated 7 months ago
- Running load tests on a FastAPI application using Locust☆15Updated last month
- Dynamic Metadata based RAG Framework☆75Updated 9 months ago
- ☆18Updated last year
- Applying Evaluation Driven Development (EDD) to aid in the design decision of RAG pipelines☆31Updated last year
- Generate Tools and Toolkits from any Python SDK -- no extra code required☆50Updated 6 months ago
- This project involves using llamaindex Multi Agents concierge system and Qdrant vector database to customize the RAG application with use…☆50Updated 8 months ago