IBM / eval-assistLinks
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.
β59Updated this week
Alternatives and similar repositories for eval-assist
Users that are interested in eval-assist are comparing it to the libraries listed below
Sorting:
- π¦ Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data β¦β206Updated this week
- Chunk your text using gpt4o-mini more accuratelyβ44Updated 11 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ113Updated this week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ105Updated 3 months ago
- β77Updated last year
- β40Updated last year
- A method for steering llms to better follow instructionsβ47Updated last week
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.β122Updated this week
- Codebase accompanying the Summary of a Haystack paper.β79Updated 9 months ago
- Generalist and Lightweight Model for Text Classificationβ139Updated last month
- Analysis on the cost of encoder based modelsβ11Updated 5 months ago
- Synthetic Text Dataset Generation for LLM projectsβ33Updated this week
- all code examples in the blog postsβ22Updated 5 months ago
- Writing Blog Posts with Generative Feedback Loops!β49Updated last year
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" π€β71Updated 7 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.β163Updated this week
- β145Updated 11 months ago
- Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging coβ¦β112Updated 11 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β111Updated 10 months ago
- A curated list of materials on AI guardailsβ39Updated last month
- Repository for the paper "MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance"β19Updated 4 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created byβ¦β31Updated 10 months ago
- Official Repo for CRMArena and CRMArena-Proβ101Updated 3 weeks ago
- β48Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β32Updated 3 months ago
- β37Updated last year
- Plug-and-play document processing pipelines with zero-shot models.β69Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ65Updated last year
- codebase release for EMNLP2023 paper publicationβ19Updated 2 months ago