aws-samples / evaluating-large-language-models-using-llm-as-a-judgeLinks
☆19Updated 8 months ago
Alternatives and similar repositories for evaluating-large-language-models-using-llm-as-a-judge
Users that are interested in evaluating-large-language-models-using-llm-as-a-judge are comparing it to the libraries listed below
Sorting:
- Streamlit app for recommending eval functions using prompt diffs☆29Updated last year
- ☆20Updated 11 months ago
- A method for steering llms to better follow instructions☆53Updated 2 months ago
- Dynamic Metadata based RAG Framework☆75Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- A simple Streamlit application to visualize document chunks and queries in embedding space 🗺️🔍☆13Updated 5 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- ☆43Updated last year
- ☆40Updated 9 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- ☆78Updated 8 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆45Updated last year
- 💙 Unstructured Data Connectors for Haystack 2.0☆17Updated 2 years ago
- Official Repo for CRMArena and CRMArena-Pro☆118Updated 3 months ago
- ☆55Updated 3 months ago
- ☆80Updated last year
- Verifiers for LLM Reinforcement Learning☆74Updated 5 months ago
- ☆50Updated 4 months ago
- Complete example of how to build an Agentic RAG architecture with Redis, Amazon Bedrock, and LlamaIndex.☆98Updated 10 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆32Updated last year
- ☆14Updated last year
- Training setup for Langchain's Open Deep Research☆64Updated last month
- ☆50Updated last year
- ☆24Updated 9 months ago
- LangChain, Llama2-Chat, and zero- and few-shot prompting are used to generate synthetic datasets for IR and RAG system evaluation☆37Updated last year
- ☆146Updated last year
- Creating Generative AI Apps which work☆17Updated 5 months ago
- ☆49Updated 8 months ago
- ☆95Updated 6 months ago