IBM / eval-assistLinks
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.
☆67Updated this week
Alternatives and similar repositories for eval-assist
Users that are interested in eval-assist are comparing it to the libraries listed below
Sorting:
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆206Updated this week
- Chunk your text using gpt4o-mini more accurately☆44Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆113Updated 5 months ago
- Synthetic Text Dataset Generation for LLM projects☆37Updated 2 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆231Updated this week
- Official Repo for CRMArena and CRMArena-Pro☆110Updated 2 months ago
- Generalist and Lightweight Model for Text Classification☆156Updated 2 months ago
- ☆145Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆114Updated this week
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆128Updated this week
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆32Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆65Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆33Updated last month
- A method for steering llms to better follow instructions☆50Updated 3 weeks ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr …☆33Updated 2 weeks ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆176Updated 11 months ago
- ☆80Updated last year
- Explore the use of DSPy for extracting features from PDFs 🔎☆46Updated last year
- Mistral + Haystack: build RAG pipelines that rock 🤘☆105Updated last year
- ☆20Updated last year
- A curated list of materials on AI guardails☆40Updated 3 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆102Updated 4 months ago
- Collection of resources for RL and Reasoning☆26Updated 7 months ago
- A RAG that can scale 🧑🏻💻☆11Updated last year
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆59Updated 6 months ago
- ☆76Updated 7 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆138Updated 2 weeks ago