ibm-self-serve-assets / JudgeIt-LLM-as-a-JudgeLinks
Automation Framework using LLM-as-a-judge to Scale Eval of Gen AI solutions (RAG, Multi-turn, Query Rewrite, Text2SQL etc.); that is a good proxy for human judgement.
β27Updated 4 months ago
Alternatives and similar repositories for JudgeIt-LLM-as-a-Judge
Users that are interested in JudgeIt-LLM-as-a-Judge are comparing it to the libraries listed below
Sorting:
- β46Updated 8 months ago
- π A deep-dive into HyDE for Advanced LLM RAG + π‘ Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, coveraβ¦β32Updated last year
- The code for LexDrafter framework: a framework that assists in drafting Definitions articles for legislative documents using retrieval auβ¦β11Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated 10 months ago
- Codebase accompanying the Summary of a Haystack paper.β78Updated 8 months ago
- This repository contains a pipeline for fine-tuning Large Language Models (LLMs) for Text-to-SQL conversion using General Reward Proximalβ¦β25Updated last month
- β49Updated 7 months ago
- Measuring RAG solutions throughput and latencyβ17Updated 10 months ago
- Lighter, cheaper and faster RAG toolkit (Graph RAG) supported by TargetPilotβ45Updated 7 months ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β77Updated 7 months ago
- β24Updated 5 months ago
- β20Updated last month
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generationβ19Updated last week
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environmentsβ61Updated last week
- π§ Compare how Agent systems perform on several benchmarks. ππβ97Updated 7 months ago
- Code for Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health recordsβ19Updated last year
- β40Updated 4 months ago
- LLM reads a paper and produce a working prototypeβ57Updated last month
- β45Updated 2 weeks ago
- Code for the paper, From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Processβ15Updated 8 months ago
- β28Updated last year
- [EMNLP 2024] TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generationβ28Updated 2 months ago
- β28Updated 4 months ago
- β41Updated 5 months ago
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".β75Updated 2 months ago
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performanceβ75Updated 6 months ago
- DSPY on action with OpenSource LLMs.β70Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Modelsβ105Updated last month
- β68Updated 8 months ago
- Deep Research through Multi-Agents, using GraphRAGβ71Updated 6 months ago