prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 π―
β794Updated 2 months ago
Related projects β
Alternatives and complementary repositories for prometheus-eval
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β1,612Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ787Updated last week
- β445Updated last week
- Generative Representational Instruction Tuningβ562Updated this week
- Automatically evaluate your LLMs in Google Colabβ556Updated 6 months ago
- ReFT: Representation Finetuning for Language Modelsβ1,145Updated this week
- Official repository for ORPOβ420Updated 5 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,042Updated this week
- Automated Evaluation of RAG Systemsβ479Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,026Updated last week
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.β692Updated last month
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality sβ¦β476Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ874Updated last week
- RAGChecker: A Fine-grained Framework For Diagnosing RAGβ528Updated last month
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ467Updated 4 months ago
- awesome synthetic (text) datasetsβ239Updated last week
- A reading list on LLM based Synthetic Data Generation π₯β761Updated this week
- Best practices for distilling large language models.β392Updated 9 months ago
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β838Updated 3 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β502Updated 3 weeks ago
- Code for Quiet-STaRβ639Updated 2 months ago
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.β401Updated 2 months ago
- List of papers on hallucination detection in LLMs.β669Updated last week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.β1,797Updated last week
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR tβ¦β318Updated 3 weeks ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β764Updated this week
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningβ332Updated 2 months ago
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically dβ¦β286Updated 11 months ago
- An Open Source Toolkit For LLM Distillationβ350Updated last month
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"β868Updated last month