amazon-science / auto-rag-evalLinks

Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"

☆83

Alternatives and similar repositories for auto-rag-eval

Users that are interested in auto-rag-eval are comparing it to the libraries listed below

Sorting:

amazon-science / RefChecker
RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Langua…
☆397Updated 5 months ago
chentong0 / factoid-wiki
Dense X Retrieval: What Retrieval Granularity Should We Use?
☆163Updated last year
ParticleMedia / RAGTruth
Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"
☆205Updated 11 months ago
facebookresearch / CRAG
Comprehensive benchmark for RAG
☆231Updated 4 months ago
nlpyang / geval
Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
☆388Updated last year
castorini / rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆545Updated this week
stanford-futuredata / ARES
Automated Evaluation of RAG Systems
☆667Updated 7 months ago
naver / bergen
Benchmarking library for RAG
☆238Updated 3 weeks ago
chaitanyamalaviya / ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆135Updated last year
alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…
☆145Updated 2 weeks ago
zetaalphavector / RAGElo
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆120Updated this week
apple / ml-superposition-prompting
☆146Updated last year
patronus-ai / financebench
☆224Updated 10 months ago
DaoD / INTERS
This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"
☆204Updated 10 months ago
potsawee / selfcheckgpt
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆571Updated last year
apple / ToolSandbox
☆221Updated last year
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆116Updated last week
night-chen / ToolQA
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆282Updated 2 years ago
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆294Updated last year
patronus-ai / Lynx-hallucination-detection
☆43Updated last year
rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆115Updated 3 months ago
spcl / MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
☆230Updated last month
KarelDO / xmc.dspy
In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
☆442Updated last year
yuh-zha / AlignScore
ACL2023 - AlignScore, a metric for factual consistency evaluation.
☆138Updated last year
asahi417 / lm-question-generation
Multilingual/multidomain question generation datasets, models, and python library for question generation.
☆364Updated last year
davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆305Updated 3 months ago
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆78Updated last year
microsoft / MS-MARCO-Web-Search
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆345Updated 10 months ago
xfactlab / orpo
Official repository for ORPO
☆463Updated last year
philschmid / sagemaker-huggingface-llama-2-samples
☆89Updated 2 years ago