aws-samples / evaluating-large-language-models-using-llm-as-a-judgeLinks

☆19

Alternatives and similar repositories for evaluating-large-language-models-using-llm-as-a-judge

Users that are interested in evaluating-large-language-models-using-llm-as-a-judge are comparing it to the libraries listed below

Sorting:

patronus-ai / Lynx-hallucination-detection
☆41Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
weaviate-tutorials / Hurricane
Writing Blog Posts with Generative Feedback Loops!
☆50Updated last year
SalesforceAIResearch / CRMArena
Official Repo for CRMArena and CRMArena-Pro
☆104Updated last month
awslabs / extending-the-context-length-of-open-source-llms
☆56Updated last month
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
enguard-ai / awesome-ai-guardails
A curated list of materials on AI guardails
☆39Updated 2 months ago
langchain-ai / prompt-eval-recommendation
Streamlit app for recommending eval functions using prompt diffs
☆29Updated last year
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
S1M0N38 / dspy-arxiv
Explore the use of DSPy for extracting features from PDFs 🔎
☆45Updated last year
davanstrien / data-for-fine-tuning-llms
☆79Updated last year
miralab-ai / autoreason
☆40Updated 7 months ago
microsoft / llm-steer-instruct
A method for steering llms to better follow instructions
☆48Updated 3 weeks ago
ali-bahrainian / RAG_best_practices
☆93Updated 4 months ago
SalesforceAIResearch / SFR-RAG
☆77Updated 6 months ago
lancedb / ragged
☆20Updated 9 months ago
padas-lab-de / ir-rag-sigir24-persona-rag
☆47Updated 10 months ago
PrithivirajDamodaran / SPLADERunner
Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…
☆32Updated 11 months ago
jjovalle99 / agentic-design-patterns
☆14Updated last year
TIGER-AI-Lab / StructLM
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
☆75Updated 9 months ago
mrmps / ai-chunker
Chunk your text using gpt4o-mini more accurately
☆44Updated last year
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
apple / ml-superposition-prompting
☆145Updated last year
explodinggradients / nemesis
Reward Model framework for LLM RLHF
☆61Updated 2 years ago
davanstrien / haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
☆65Updated last year
mddunlap924 / LangChain-SynData-RAG-Eval
LangChain, Llama2-Chat, and zero- and few-shot prompting are used to generate synthetic datasets for IR and RAG system evaluation
☆37Updated last year
yai333 / Text-to-SQL-GRPO-Fine-tuning-Pipeline
This repository contains a pipeline for fine-tuning Large Language Models (LLMs) for Text-to-SQL conversion using General Reward Proximal…
☆32Updated 3 months ago
v-prgmr / mergekit
Tools for merging pretrained large language models.
☆19Updated last year
aws-samples / llm-evaluation-methodology
☆44Updated 9 months ago
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 2 months ago