patronus-ai / financebenchLinks
☆228Updated last year
Alternatives and similar repositories for financebench
Users that are interested in financebench are comparing it to the libraries listed below
Sorting:
- Knowledge Graph Retrieval Augmented Generation (KG-RAG) Eval Datasets☆190Updated last year
- Comprehensive benchmark for RAG☆245Updated 5 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆166Updated last year
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆235Updated 2 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆160Updated last week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆114Updated last year
- Repository for "MultiHop-RAG: A Dataset for Evaluating Retrieval-Augmented Generation Across Documents" (COLM 2024)☆397Updated 8 months ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆143Updated 6 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- ☆148Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆305Updated last year
- ☆99Updated 2 months ago
- Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"☆335Updated 3 years ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆242Updated last year
- Sample notebooks and prompts for LLM evaluation☆156Updated last month
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆502Updated 10 months ago
- An open science effort to benchmark legal reasoning in foundation models☆518Updated last year
- Automated Evaluation of RAG Systems☆678Updated 8 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆444Updated last year
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆193Updated 3 months ago
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆137Updated 2 years ago
- awesome synthetic (text) datasets☆314Updated 3 weeks ago
- The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.☆183Updated 7 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆116Updated 4 months ago
- Benchmarking library for RAG☆248Updated 2 months ago
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆152Updated last year
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆215Updated last year
- A small library of LLM judges☆306Updated 4 months ago
- This repository implements the chain of verification paper by Meta AI☆182Updated 2 years ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆253Updated 4 months ago