chroma-core / generative-benchmarkingLinks
☆37Updated last month
Alternatives and similar repositories for generative-benchmarking
Users that are interested in generative-benchmarking are comparing it to the libraries listed below
Sorting:
- Attribute (or cite) statements generated by LLMs back to in-context information.☆276Updated 10 months ago
- A small library of LLM judges☆276Updated last month
- 🤗 Benchmark Large Language Models Reliably On Your Data☆391Updated this week
- ☆145Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆288Updated this week
- ☆118Updated last year
- ☆20Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆176Updated 11 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆167Updated this week
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆114Updated this week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆109Updated last year
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.☆102Updated 4 months ago
- awesome synthetic (text) datasets☆295Updated 2 months ago
- Official Repo for CRMArena and CRMArena-Pro☆110Updated 2 months ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆125Updated 3 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆138Updated 2 weeks ago
- The first dense retrieval model that can be prompted like an LM☆86Updated 3 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆134Updated last week
- Collection of resources for RL and Reasoning☆26Updated 7 months ago
- code for training & evaluating Contextual Document Embedding models☆197Updated 3 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆116Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- Synthetic Data for LLM Fine-Tuning☆120Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 10 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆329Updated 3 months ago
- ☆134Updated 2 weeks ago
- ☆79Updated this week
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆294Updated this week
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆233Updated last month