chroma-core / generative-benchmarkingLinks
☆48Updated 2 months ago
Alternatives and similar repositories for generative-benchmarking
Users that are interested in generative-benchmarking are comparing it to the libraries listed below
Sorting:
- Collection of resources for RL and Reasoning☆27Updated last year
- Python library to use Pleias-RAG models☆68Updated 9 months ago
- Official Repo for CRMArena and CRMArena-Pro☆132Updated this week
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆174Updated this week
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆126Updated 3 months ago
- Simple UI for debugging correlations of text embeddings☆305Updated 8 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆176Updated 2 weeks ago
- The first dense retrieval model that can be prompted like an LM☆90Updated 9 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆319Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆277Updated last year
- awesome synthetic (text) datasets☆321Updated last month
- ☆147Updated last year
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆181Updated 9 months ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆61Updated last year
- A small library of LLM judges☆321Updated 6 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆302Updated last month
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- Official code for NeurIPS 2025 paper "AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise"☆126Updated 2 weeks ago
- Synthetic Text Dataset Generation for LLM projects☆55Updated 2 months ago
- ☆141Updated 5 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆84Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆113Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆88Updated 11 months ago
- ☆162Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆184Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 9 months ago
- Simple examples using Argilla tools to build AI☆57Updated last year
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆140Updated 5 months ago