ibm-self-serve-assets / JudgeIt-LLM-as-a-Judge
Automation Framework using LLM-as-a-judge to Scale Eval of Gen AI solutions (RAG, Multi-turn, Query Rewrite, Text2SQL etc.); that is a good proxy for human judgement.
☆27Updated 4 months ago
Alternatives and similar repositories for JudgeIt-LLM-as-a-Judge
Users that are interested in JudgeIt-LLM-as-a-Judge are comparing it to the libraries listed below
Sorting:
- 🔎 A deep-dive into HyDE for Advanced LLM RAG + 💡 Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, covera…☆32Updated last year
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Updated 6 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆55Updated 2 months ago
- ☆93Updated 8 months ago
- DSPY on action with OpenSource LLMs.☆71Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 7 months ago
- ☆45Updated 7 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆82Updated 4 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 10 months ago
- ☆28Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 6 months ago
- Create a knowledge graph out of unstructed legal text - use said knowledge graph in a graph augmented retrieval augmented generation pipe…☆45Updated 7 months ago
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance☆73Updated 5 months ago
- Lighter, cheaper and faster RAG toolkit (Graph RAG) supported by TargetPilot☆45Updated 7 months ago
- ☆40Updated 3 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- LangChain, Llama2-Chat, and zero- and few-shot prompting are used to generate synthetic datasets for IR and RAG system evaluation☆36Updated last year
- ☆52Updated 10 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆96Updated last year
- ☆67Updated 8 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆110Updated 8 months ago
- Reward Model framework for LLM RLHF☆61Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆76Updated 6 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆132Updated 2 weeks ago
- LLM reads a paper and produce a working prototype☆57Updated last month
- ☆77Updated 11 months ago
- ☆19Updated last month
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆137Updated 11 months ago
- ☆27Updated last month
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆53Updated this week