ibm-ecosystem-engineering / JudgeIt-LLM-as-a-Judge

Automation Framework using LLM-as-a-judge to Scale Eval of Gen AI solutions (RAG, Multi-turn, Query Rewrite, Text2SQL etc.); that is a good proxy for human judgement.

☆23

Alternatives and similar repositories for JudgeIt-LLM-as-a-Judge:

Users that are interested in JudgeIt-LLM-as-a-Judge are comparing it to the libraries listed below

deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆48Updated 7 months ago
padas-lab-de / ir-rag-sigir24-persona-rag
☆45Updated 4 months ago
ianhohoho / auto-hyde
🔎 A deep-dive into HyDE for Advanced LLM RAG + 💡 Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, covera…
☆32Updated 10 months ago
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆48Updated 2 weeks ago
davanstrien / data-for-fine-tuning-llms
☆76Updated 8 months ago
TIGER-AI-Lab / StructLM
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
☆76Updated 4 months ago
davanstrien / haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
☆57Updated 11 months ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆101Updated 2 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆74Updated 5 months ago
arcee-ai / DAM
☆48Updated 3 months ago
weaviate-tutorials / Hurricane
Writing Blog Posts with Generative Feedback Loops!
☆47Updated 11 months ago
oceanumeric / EnteRAG
A RAG that can scale 🧑🏻‍💻
☆11Updated 8 months ago
Knowledgator / utca
Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…
☆28Updated 2 months ago
smallporridge / AssistRAG
☆24Updated last month
apple / ml-superposition-prompting
☆141Updated 7 months ago
pacman100 / peft-codegen-25
☆24Updated last year
Tebmer / Rereading-LLM-Reasoning
EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…
☆24Updated 2 months ago
louisbrulenaudet / ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆64Updated 3 months ago
patronus-ai / Lynx-hallucination-detection
☆32Updated 7 months ago
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆102Updated 10 months ago
gersteinlab / chemagent
☆22Updated 2 weeks ago
olly-styles / WorkBench
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.
☆37Updated 6 months ago
pygongnlp / CoSearchAgent
[SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models
☆22Updated last year
krypticmouse / matryoshka-representation-learning
PyTorch implementation for MRL
☆18Updated last year
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆75Updated 4 months ago
lancedb / ragged
☆18Updated 4 months ago
lamm-mit / PRefLexOR
Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning
☆50Updated last month
Knowledgator / unlimited_classifier
Universal text classifier for generative models
☆22Updated 6 months ago
SalesforceAIResearch / SFR-RAG
☆73Updated last month
SciPhi-AI / RAG-Performance
Measuring RAG solutions throughput and latency
☆15Updated 6 months ago