aws-samples / evaluating-large-language-models-using-llm-as-a-judgeLinks
β20Updated 11 months ago
Alternatives and similar repositories for evaluating-large-language-models-using-llm-as-a-judge
Users that are interested in evaluating-large-language-models-using-llm-as-a-judge are comparing it to the libraries listed below
Sorting:
- β24Updated last year
- A simple Streamlit application to visualize document chunks and queries in embedding space πΊοΈπβ13Updated 8 months ago
- β54Updated last year
- A method for steering llms to better follow instructionsβ74Updated 5 months ago
- β46Updated last year
- β52Updated 7 months ago
- β56Updated 6 months ago
- Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Serviceβ28Updated last year
- Context is Key: Combining Embedding-based Retrieval with LLMs for Comprehensive Knowledge Enrichmentβ31Updated 2 years ago
- β54Updated last year
- β42Updated last year
- Encountering 14 different Naive RAG fails and using KG to solve itβ15Updated last month
- Official Repo for CRMArena and CRMArena-Proβ127Updated 2 months ago
- Generative AI with Amazon Bedrock, published by Packtβ27Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β51Updated last year
- LangChain, Llama2-Chat, and zero- and few-shot prompting are used to generate synthetic datasets for IR and RAG system evaluationβ38Updated 2 years ago
- Training setup for Langchain's Open Deep Researchβ74Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.β80Updated last year
- β147Updated last year
- Streamlit app for recommending eval functions using prompt diffsβ30Updated 2 years ago
- β64Updated 8 months ago
- Writing Blog Posts with Generative Feedback Loops!β50Updated last year
- β23Updated last month
- β82Updated 2 months ago
- β70Updated last month
- A semantic research engine to get relevant papers based on a user query. Application frontend with Chainlit Copilot. Observability with Lβ¦β81Updated last year
- β17Updated 9 months ago
- Creating Generative AI Apps which workβ17Updated 8 months ago
- Large Language Model Hosting Containerβ90Updated 3 months ago
- Exploring limitations of LLM-as-a-judgeβ19Updated last year