chroma-core / generative-benchmarkingLinks
☆48Updated 2 months ago
Alternatives and similar repositories for generative-benchmarking
Users that are interested in generative-benchmarking are comparing it to the libraries listed below
Sorting:
- Collection of resources for RL and Reasoning☆27Updated 11 months ago
- Python library to use Pleias-RAG models☆68Updated 8 months ago
- Official Repo for CRMArena and CRMArena-Pro☆132Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- The first dense retrieval model that can be prompted like an LM☆90Updated 8 months ago
- ☆147Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆174Updated this week
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆103Updated 5 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆317Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆97Updated 3 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆276Updated last year
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆114Updated 9 months ago
- Generalist and Lightweight Model for Text Classification☆168Updated 2 weeks ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆113Updated last year
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆61Updated 11 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆84Updated last year
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆26Updated 6 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆126Updated 2 months ago
- Inference-time scaling for LLMs-as-a-judge.☆326Updated 2 months ago
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆56Updated 5 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆223Updated last month
- ☆59Updated last year
- awesome synthetic (text) datasets☆321Updated 2 weeks ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆141Updated 5 months ago
- ☆91Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- ☆210Updated 7 months ago